summaryrefslogtreecommitdiff
path: root/doc/development/database
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database')
-rw-r--r--doc/development/database/add_foreign_key_to_existing_column.md2
-rw-r--r--doc/development/database/avoiding_downtime_in_migrations.md491
-rw-r--r--doc/development/database/background_migrations.md504
-rw-r--r--doc/development/database/client_side_connection_pool.md7
-rw-r--r--doc/development/database/database_lab.md101
-rw-r--r--doc/development/database/database_reviewer_guidelines.md2
-rw-r--r--doc/development/database/deleting_migrations.md39
-rw-r--r--doc/development/database/index.md10
-rw-r--r--doc/development/database/keyset_pagination.md6
-rw-r--r--doc/development/database/layout_and_access_patterns.md61
-rw-r--r--doc/development/database/loose_foreign_keys.md423
-rw-r--r--doc/development/database/migrations_for_multiple_databases.md390
-rw-r--r--doc/development/database/multiple_databases.md169
-rw-r--r--doc/development/database/not_null_constraints.md2
-rw-r--r--doc/development/database/post_deployment_migrations.md81
-rw-r--r--doc/development/database/rename_database_tables.md2
-rw-r--r--doc/development/database/strings_and_the_text_data_type.md4
-rw-r--r--doc/development/database/table_partitioning.md2
18 files changed, 2143 insertions, 153 deletions
diff --git a/doc/development/database/add_foreign_key_to_existing_column.md b/doc/development/database/add_foreign_key_to_existing_column.md
index d74f826cc14..bfd455ef9da 100644
--- a/doc/development/database/add_foreign_key_to_existing_column.md
+++ b/doc/development/database/add_foreign_key_to_existing_column.md
@@ -123,7 +123,7 @@ end
Validating the foreign key scans the whole table and makes sure that each relation is correct.
NOTE:
-When using [background migrations](../background_migrations.md), foreign key validation should happen in the next GitLab release.
+When using [background migrations](background_migrations.md), foreign key validation should happen in the next GitLab release.
Migration file for validating the foreign key:
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
new file mode 100644
index 00000000000..ad2768397e6
--- /dev/null
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -0,0 +1,491 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Avoiding downtime in migrations
+
+When working with a database certain operations may require downtime. Since we
+cannot have downtime in migrations we need to use a set of steps to get the
+same end result without downtime. This guide describes various operations that
+may appear to need downtime, their impact, and how to perform them without
+requiring downtime.
+
+## Dropping Columns
+
+Removing columns is tricky because running GitLab processes may still be using
+the columns. To work around this safely, you will need three steps in three releases:
+
+1. Ignoring the column (release M)
+1. Dropping the column (release M+1)
+1. Removing the ignore rule (release M+2)
+
+The reason we spread this out across three releases is that dropping a column is
+a destructive operation that can't be rolled back easily.
+
+Following this procedure helps us to make sure there are no deployments to GitLab.com
+and upgrade processes for self-managed installations that lump together any of these steps.
+
+### Step 1: Ignoring the column (release M)
+
+The first step is to ignore the column in the application code. This is
+necessary because Rails caches the columns and re-uses this cache in various
+places. This can be done by defining the columns to ignore. For example, to ignore
+`updated_at` in the User model you'd use the following:
+
+```ruby
+class User < ApplicationRecord
+ include IgnorableColumns
+ ignore_column :updated_at, remove_with: '12.7', remove_after: '2020-01-22'
+end
+```
+
+Multiple columns can be ignored, too:
+
+```ruby
+ignore_columns %i[updated_at created_at], remove_with: '12.7', remove_after: '2020-01-22'
+```
+
+If the model exists in CE and EE, the column has to be ignored in the CE model. If the
+model only exists in EE, then it has to be added there.
+
+We require indication of when it is safe to remove the column ignore with:
+
+- `remove_with`: set to a GitLab release typically two releases (M+2) after adding the
+ column ignore.
+- `remove_after`: set to a date after which we consider it safe to remove the column
+ ignore, typically after the M+1 release date, during the M+2 development cycle.
+
+This information allows us to reason better about column ignores and makes sure we
+don't remove column ignores too early for both regular releases and deployments to GitLab.com. For
+example, this avoids a situation where we deploy a bulk of changes that include both changes
+to ignore the column and subsequently remove the column ignore (which would result in a downtime).
+
+In this example, the change to ignore the column went into release 12.5.
+
+### Step 2: Dropping the column (release M+1)
+
+Continuing our example, dropping the column goes into a _post-deployment_ migration in release 12.6:
+
+```ruby
+ remove_column :user, :updated_at
+```
+
+### Step 3: Removing the ignore rule (release M+2)
+
+With the next release, in this example 12.7, we set up another merge request to remove the ignore rule.
+This removes the `ignore_column` line and - if not needed anymore - also the inclusion of `IgnoreableColumns`.
+
+This should only get merged with the release indicated with `remove_with` and once
+the `remove_after` date has passed.
+
+## Renaming Columns
+
+Renaming columns the normal way requires downtime as an application may continue
+using the old column name during/after a database migration. To rename a column
+without requiring downtime we need two migrations: a regular migration, and a
+post-deployment migration. Both these migration can go in the same release.
+
+### Step 1: Add The Regular Migration
+
+First we need to create the regular migration. This migration should use
+`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the
+renaming. For example
+
+```ruby
+# A regular migration in db/migrate
+class RenameUsersUpdatedAtToUpdatedAtTimestamp < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ rename_column_concurrently :users, :updated_at, :updated_at_timestamp
+ end
+
+ def down
+ undo_rename_column_concurrently :users, :updated_at, :updated_at_timestamp
+ end
+end
+```
+
+This will take care of renaming the column, ensuring data stays in sync, and
+copying over indexes and foreign keys.
+
+If a column contains one or more indexes that don't contain the name of the
+original column, the previously described procedure will fail. In that case,
+you'll first need to rename these indexes.
+
+### Step 2: Add A Post-Deployment Migration
+
+The renaming procedure requires some cleaning up in a post-deployment migration.
+We can perform this cleanup using
+`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`:
+
+```ruby
+# A post-deployment migration in db/post_migrate
+class CleanupUsersUpdatedAtRename < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp
+ end
+
+ def down
+ undo_cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp
+ end
+end
+```
+
+If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), please carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet.
+With [Canary](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/) it is possible that the system runs in this state for a significant amount of time.
+
+## Changing Column Constraints
+
+Adding or removing a `NOT NULL` clause (or another constraint) can typically be
+done without requiring downtime. However, this does require that any application
+changes are deployed _first_. Thus, changing the constraints of a column should
+happen in a post-deployment migration.
+
+Avoid using `change_column` as it produces an inefficient query because it re-defines
+the whole column type.
+
+You can check the following guides for each specific use case:
+
+- [Adding foreign-key constraints](../migration_style_guide.md#adding-foreign-key-constraints)
+- [Adding `NOT NULL` constraints](not_null_constraints.md)
+- [Adding limits to text columns](strings_and_the_text_data_type.md)
+
+## Changing Column Types
+
+Changing the type of a column can be done using
+`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This
+method works similarly to `rename_column_concurrently`. For example, let's say
+we want to change the type of `users.username` from `string` to `text`.
+
+### Step 1: Create A Regular Migration
+
+A regular migration is used to create a new column with a temporary name along
+with setting up some triggers to keep data in sync. Such a migration would look
+as follows:
+
+```ruby
+# A regular migration in db/migrate
+class ChangeUsersUsernameStringToText < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ change_column_type_concurrently :users, :username, :text
+ end
+
+ def down
+ undo_change_column_type_concurrently :users, :username
+ end
+end
+```
+
+### Step 2: Create A Post Deployment Migration
+
+Next we need to clean up our changes using a post-deployment migration:
+
+```ruby
+# A post-deployment migration in db/post_migrate
+class ChangeUsersUsernameStringToTextCleanup < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ cleanup_concurrent_column_type_change :users, :username
+ end
+
+ def down
+ undo_cleanup_concurrent_column_type_change :users, :username, :string
+ end
+end
+```
+
+And that's it, we're done!
+
+### Casting data to a new type
+
+Some type changes require casting data to a new type. For example when changing from `text` to `jsonb`.
+In this case, use the `type_cast_function` option.
+Make sure there is no bad data and the cast will always succeed. You can also provide a custom function that handles
+casting errors.
+
+Example migration:
+
+```ruby
+ def up
+ change_column_type_concurrently :users, :settings, :jsonb, type_cast_function: 'jsonb'
+ end
+```
+
+## Changing The Schema For Large Tables
+
+While `change_column_type_concurrently` and `rename_column_concurrently` can be
+used for changing the schema of a table without downtime, it doesn't work very
+well for large tables. Because all of the work happens in sequence the migration
+can take a very long time to complete, preventing a deployment from proceeding.
+They can also produce a lot of pressure on the database due to it rapidly
+updating many rows in sequence.
+
+To reduce database pressure you should instead use a background migration
+when migrating a column in a large table (for example, `issues`). This will
+spread the work / load over a longer time period, without slowing down deployments.
+
+For more information, see [the documentation on cleaning up background
+migrations](background_migrations.md#cleaning-up).
+
+## Adding Indexes
+
+Adding indexes does not require downtime when `add_concurrent_index`
+is used.
+
+See also [Migration Style Guide](../migration_style_guide.md#adding-indexes)
+for more information.
+
+## Dropping Indexes
+
+Dropping an index does not require downtime.
+
+## Adding Tables
+
+This operation is safe as there's no code using the table just yet.
+
+## Dropping Tables
+
+Dropping tables can be done safely using a post-deployment migration, but only
+if the application no longer uses the table.
+
+## Renaming Tables
+
+Renaming tables requires downtime as an application may continue
+using the old table name during/after a database migration.
+
+If the table and the ActiveRecord model is not in use yet, removing the old
+table and creating a new one is the preferred way to "rename" the table.
+
+Renaming a table is possible without downtime by following our multi-release
+[rename table process](rename_database_tables.md#rename-table-without-downtime).
+
+## Adding Foreign Keys
+
+Adding foreign keys usually works in 3 steps:
+
+1. Start a transaction
+1. Run `ALTER TABLE` to add the constraint(s)
+1. Check all existing data
+
+Because `ALTER TABLE` typically acquires an exclusive lock until the end of a
+transaction this means this approach would require downtime.
+
+GitLab allows you to work around this by using
+`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method
+ensures that no downtime is needed.
+
+## Removing Foreign Keys
+
+This operation does not require downtime.
+
+## Migrating `integer` primary keys to `bigint`
+
+To [prevent the overflow risk](https://gitlab.com/groups/gitlab-org/-/epics/4785) for some tables
+with `integer` primary key (PK), we have to migrate their PK to `bigint`. The process to do this
+without downtime and causing too much load on the database is described below.
+
+### Initialize the conversion and start migrating existing data (release N)
+
+To start the process, add a regular migration to create the new `bigint` columns. Use the provided
+`initialize_conversion_of_integer_to_bigint` helper. The helper also creates a database trigger
+to keep in sync both columns for any new records ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072312_initialize_conversion_of_ci_stages_to_bigint.rb)):
+
+```ruby
+class InitializeConversionOfCiStagesToBigint < ActiveRecord::Migration[6.1]
+ include Gitlab::Database::MigrationHelpers
+
+ TABLE = :ci_stages
+ COLUMNS = %i(id)
+
+ def up
+ initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS)
+ end
+
+ def down
+ revert_initialize_conversion_of_integer_to_bigint(TABLE, COLUMNS)
+ end
+end
+```
+
+Ignore the new `bigint` columns:
+
+```ruby
+module Ci
+ class Stage < Ci::ApplicationRecord
+ include IgnorableColumns
+ ignore_column :id_convert_to_bigint, remove_with: '14.2', remove_after: '2021-08-22'
+ end
+```
+
+To migrate existing data, we introduced new type of _batched background migrations_.
+Unlike the classic background migrations, built on top of Sidekiq, batched background migrations
+don't have to enqueue and schedule all the background jobs at the beginning.
+They also have other advantages, like automatic tuning of the batch size, better progress visibility,
+and collecting metrics. To start the process, use the provided `backfill_conversion_of_integer_to_bigint`
+helper ([example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072346_backfill_ci_stages_for_bigint_conversion.rb)):
+
+```ruby
+class BackfillCiStagesForBigintConversion < ActiveRecord::Migration[6.1]
+ include Gitlab::Database::MigrationHelpers
+
+ TABLE = :ci_stages
+ COLUMNS = %i(id)
+
+ def up
+ backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS)
+ end
+
+ def down
+ revert_backfill_conversion_of_integer_to_bigint(TABLE, COLUMNS)
+ end
+end
+```
+
+### Monitor the background migration
+
+Check how the migration is performing while it's running. Multiple ways to do this are described below.
+
+#### High-level status of batched background migrations
+
+See how to [check the status of batched background migrations](../../update/index.md#checking-for-background-migrations-before-upgrading).
+
+#### Query the database
+
+We can query the related database tables directly. Requires access to read-only replica.
+Example queries:
+
+```sql
+-- Get details for batched background migration for given table
+SELECT * FROM batched_background_migrations WHERE table_name = 'namespaces'\gx
+
+-- Get count of batched background migration jobs by status for given table
+SELECT
+ batched_background_migrations.id, batched_background_migration_jobs.status, COUNT(*)
+FROM
+ batched_background_migrations
+ JOIN batched_background_migration_jobs ON batched_background_migrations.id = batched_background_migration_jobs.batched_background_migration_id
+WHERE
+ table_name = 'namespaces'
+GROUP BY
+ batched_background_migrations.id, batched_background_migration_jobs.status;
+
+-- Batched background migration progress for given table (based on estimated total number of tuples)
+SELECT
+ m.table_name,
+ LEAST(100 * sum(j.batch_size) / pg_class.reltuples, 100) AS percentage_complete
+FROM
+ batched_background_migrations m
+ JOIN batched_background_migration_jobs j ON j.batched_background_migration_id = m.id
+ JOIN pg_class ON pg_class.relname = m.table_name
+WHERE
+ j.status = 3 AND m.table_name = 'namespaces'
+GROUP BY m.id, pg_class.reltuples;
+```
+
+#### Sidekiq logs
+
+We can also use the Sidekiq logs to monitor the worker that executes the batched background
+migrations:
+
+1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address.
+1. Change the index pattern to `pubsub-sidekiq-inf-gprd*`.
+1. Add filter for `json.queue: cronjob:database_batched_background_migration`.
+
+#### PostgreSQL slow queries log
+
+Slow queries log keeps track of low queries that took above 1 second to execute. To see them
+for batched background migration:
+
+1. Sign in to [Kibana](https://log.gprd.gitlab.net) with a `@gitlab.com` email address.
+1. Change the index pattern to `pubsub-postgres-inf-gprd*`.
+1. Add filter for `json.endpoint_id.keyword: Database::BatchedBackgroundMigrationWorker`.
+1. Optional. To see only updates, add a filter for `json.command_tag.keyword: UPDATE`.
+1. Optional. To see only failed statements, add a filter for `json.error_severity.keyword: ERROR`.
+1. Optional. Add a filter by table name.
+
+#### Grafana dashboards
+
+To monitor the health of the database, use these additional metrics:
+
+- [PostgreSQL Tuple Statistics](https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m): if you see high rate of updates for the tables being actively converted, or increasing percentage of dead tuples for this table, it might mean that autovacuum cannot keep up.
+- [PostgreSQL Overview](https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1): if you see high system usage or transactions per second (TPS) on the primary database server, it might mean that the migration is causing problems.
+
+### Prometheus metrics
+
+Number of [metrics](https://gitlab.com/gitlab-org/gitlab/-/blob/294a92484ce4611f660439aa48eee4dfec2230b5/lib/gitlab/database/background_migration/batched_migration_wrapper.rb#L90-128)
+for each batched background migration are published to Prometheus. These metrics can be searched for and
+visualized in Thanos ([see an example](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=3d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-06-13%2012%3A18%3A24&g0.moment_input=2021-06-13%2012%3A18%3A24)).
+
+### Swap the columns (release N + 1)
+
+After the background is completed and the new `bigint` columns are populated for all records, we can
+swap the columns. Swapping is done with post-deployment migration. The exact process depends on the
+table being converted, but in general it's done in the following steps:
+
+1. Using the provided `ensure_batched_background_migration_is_finished` helper, make sure the batched
+migration has finished ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)).
+If the migration has not completed, the subsequent steps fail anyway. By checking in advance we
+aim to have more helpful error message.
+1. Create indexes using the `bigint` columns that match the existing indexes using the `integer`
+column ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L28-34)).
+1. Create foreign keys (FK) using the `bigint` columns that match the existing FKs using the
+`integer` column. Do this both for FK referencing other tables, and FKs that reference the table
+that is being migrated ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L36-43)).
+1. Inside a transaction, swap the columns:
+ 1. Lock the tables involved. To reduce the chance of hitting a deadlock, we recommended to do this in parent to child order ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L47)).
+ 1. Rename the columns to swap names ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L49-54))
+ 1. Reset the trigger function ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L56-57)).
+ 1. Swap the defaults ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L59-62)).
+ 1. Swap the PK constraint (if any) ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L64-68)).
+ 1. Remove old indexes and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L70-72)).
+ 1. Remove old FKs (if still present) and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L74)).
+
+See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088), and [migration](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb).
+
+### Remove the trigger and old `integer` columns (release N + 2)
+
+Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper,
+drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)).
+
+### Remove ignore rules (release N + 3)
+
+In the next release after the columns were dropped, remove the ignore rules as we do not need them
+anymore ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71161)).
+
+## Data migrations
+
+Data migrations can be tricky. The usual approach to migrate data is to take a 3
+step approach:
+
+1. Migrate the initial batch of data
+1. Deploy the application code
+1. Migrate any remaining data
+
+Usually this works, but not always. For example, if a field's format is to be
+changed from JSON to something else we have a bit of a problem. If we were to
+change existing data before deploying application code we'll most likely run
+into errors. On the other hand, if we were to migrate after deploying the
+application code we could run into the same problems.
+
+If you merely need to correct some invalid data, then a post-deployment
+migration is usually enough. If you need to change the format of data (for example, from
+JSON to something else) it's typically best to add a new column for the new data
+format, and have the application use that. In such a case the procedure would
+be:
+
+1. Add a new column in the new format
+1. Copy over existing data to this new column
+1. Deploy the application code
+1. In a post-deployment migration, copy over any remaining data
+
+In general there is no one-size-fits-all solution, therefore it's best to
+discuss these kind of migrations in a merge request to make sure they are
+implemented in the best way possible.
diff --git a/doc/development/database/background_migrations.md b/doc/development/database/background_migrations.md
new file mode 100644
index 00000000000..1f7e0d76c89
--- /dev/null
+++ b/doc/development/database/background_migrations.md
@@ -0,0 +1,504 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Background migrations
+
+WARNING:
+Background migrations are strongly discouraged in favor of the new [batched background migrations framework](../batched_background_migrations.md).
+Please check that documentation and determine if that framework suits your needs and fall back
+to these only if required.
+
+Background migrations should be used to perform data migrations whenever a
+migration exceeds [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take). For example, you can use background
+migrations to migrate data that's stored in a single JSON column
+to a separate table instead.
+
+If the database cluster is considered to be in an unhealthy state, background
+migrations automatically reschedule themselves for a later point in time.
+
+## When To Use Background Migrations
+
+You should use a background migration when you migrate _data_ in tables that have
+so many rows that the process would exceed [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take) if performed using a regular Rails migration.
+
+- Background migrations should be used when migrating data in [high-traffic tables](../migration_style_guide.md#high-traffic-tables).
+- Background migrations may also be used when executing numerous single-row queries
+for every item on a large dataset. Typically, for single-record patterns, runtime is
+largely dependent on the size of the dataset, hence it should be split accordingly
+and put into background migrations.
+- Background migrations should not be used to perform schema migrations.
+
+Some examples where background migrations can be useful:
+
+- Migrating events from one table to multiple separate tables.
+- Populating one column based on JSON stored in another column.
+- Migrating data that depends on the output of external services (for example, an API).
+
+NOTE:
+If the background migration is part of an important upgrade, make sure it's announced
+in the release post. Discuss with your Project Manager if you're not sure the migration falls
+into this category.
+
+## Isolation
+
+Background migrations must be isolated and can not use application code (for example,
+models defined in `app/models`). Since these migrations can take a long time to
+run it's possible for new versions to be deployed while they are still running.
+
+It's also possible for different migrations to be executed at the same time.
+This means that different background migrations should not migrate data in a
+way that would cause conflicts.
+
+## Idempotence
+
+Background migrations are executed in a context of a Sidekiq process.
+Usual Sidekiq rules apply, especially the rule that jobs should be small
+and idempotent.
+
+See [Sidekiq best practices guidelines](https://github.com/mperham/sidekiq/wiki/Best-Practices)
+for more details.
+
+Make sure that in case that your migration job is going to be retried data
+integrity is guaranteed.
+
+## Background migrations for EE-only features
+
+All the background migration classes for EE-only features should be present in GitLab CE.
+For this purpose, an empty class can be created for GitLab CE, and it can be extended for GitLab EE
+as explained in the [guidelines for implementing Enterprise Edition features](../ee_features.md#code-in-libgitlabbackground_migration).
+
+## How It Works
+
+Background migrations are simple classes that define a `perform` method. A
+Sidekiq worker will then execute such a class, passing any arguments to it. All
+migration classes must be defined in the namespace
+`Gitlab::BackgroundMigration`, the files should be placed in the directory
+`lib/gitlab/background_migration/`.
+
+## Scheduling
+
+Scheduling a background migration should be done in a post-deployment
+migration that includes `Gitlab::Database::MigrationHelpers`
+To do so, simply use the following code while
+replacing the class name and arguments with whatever values are necessary for
+your migration:
+
+```ruby
+migrate_in('BackgroundMigrationClassName', [arg1, arg2, ...])
+```
+
+You can use the function `queue_background_migration_jobs_by_range_at_intervals`
+to automatically split the job into batches:
+
+```ruby
+queue_background_migration_jobs_by_range_at_intervals(
+ ClassName,
+ BackgroundMigrationClassName,
+ 2.minutes,
+ batch_size: 10_000
+ )
+```
+
+You'll also need to make sure that newly created data is either migrated, or
+saved in both the old and new version upon creation. For complex and time
+consuming migrations it's best to schedule a background job using an
+`after_create` hook so this doesn't affect response timings. The same applies to
+updates. Removals in turn can be handled by simply defining foreign keys with
+cascading deletes.
+
+### Rescheduling background migrations
+
+If one of the background migrations contains a bug that is fixed in a patch
+release, the background migration needs to be rescheduled so the migration would
+be repeated on systems that already performed the initial migration.
+
+When you reschedule the background migration, make sure to turn the original
+scheduling into a no-op by clearing up the `#up` and `#down` methods of the
+migration performing the scheduling. Otherwise the background migration would be
+scheduled multiple times on systems that are upgrading multiple patch releases at
+once.
+
+When you start the second post-deployment migration, you should delete any
+previously queued jobs from the initial migration with the provided
+helper:
+
+```ruby
+delete_queued_jobs('BackgroundMigrationClassName')
+```
+
+## Cleaning Up
+
+NOTE:
+Cleaning up any remaining background migrations _must_ be done in either a major
+or minor release, you _must not_ do this in a patch release.
+
+Because background migrations can take a long time you can't immediately clean
+things up after scheduling them. For example, you can't drop a column that's
+used in the migration process as this would cause jobs to fail. This means that
+you'll need to add a separate _post deployment_ migration in a future release
+that finishes any remaining jobs before cleaning things up (for example, removing a
+column).
+
+As an example, say you want to migrate the data from column `foo` (containing a
+big JSON blob) to column `bar` (containing a string). The process for this would
+roughly be as follows:
+
+1. Release A:
+ 1. Create a migration class that performs the migration for a row with a given ID.
+ You can use [background jobs tracking](#background-jobs-tracking) to simplify cleaning up.
+ 1. Deploy the code for this release, this should include some code that will
+ schedule jobs for newly created data (for example, using an `after_create` hook).
+ 1. Schedule jobs for all existing rows in a post-deployment migration. It's
+ possible some newly created rows may be scheduled twice so your migration
+ should take care of this.
+1. Release B:
+ 1. Deploy code so that the application starts using the new column and stops
+ scheduling jobs for newly created data.
+ 1. In a post-deployment migration, finalize all jobs that have not succeeded by now.
+ If you used [background jobs tracking](#background-jobs-tracking) in release A,
+ you can use `finalize_background_migration` from `BackgroundMigrationHelpers` to ensure no jobs remain.
+ This helper will:
+ 1. Use `Gitlab::BackgroundMigration.steal` to process any remaining
+ jobs in Sidekiq.
+ 1. Reschedule the migration to be run directly (that is, not through Sidekiq)
+ on any rows that weren't migrated by Sidekiq. This can happen if, for
+ instance, Sidekiq received a SIGKILL, or if a particular batch failed
+ enough times to be marked as dead.
+ 1. Remove `Gitlab::Database::BackgroundMigrationJob` rows where
+ `status = succeeded`. To retain diagnostic information that may
+ help with future bug tracking you can skip this step by specifying
+ the `delete_tracking_jobs: false` parameter.
+ 1. Remove the old column.
+
+This may also require a bump to the [import/export version](../../user/project/settings/import_export.md), if
+importing a project from a prior version of GitLab requires the data to be in
+the new format.
+
+## Example
+
+To explain all this, let's use the following example: the table `integrations` has a
+field called `properties` which is stored in JSON. For all rows you want to
+extract the `url` key from this JSON object and store it in the `integrations.url`
+column. There are millions of integrations and parsing JSON is slow, thus you can't
+do this in a regular migration.
+
+To do this using a background migration we'll start with defining our migration
+class:
+
+```ruby
+class Gitlab::BackgroundMigration::ExtractIntegrationsUrl
+ class Integration < ActiveRecord::Base
+ self.table_name = 'integrations'
+ end
+
+ def perform(start_id, end_id)
+ Integration.where(id: start_id..end_id).each do |integration|
+ json = JSON.load(integration.properties)
+
+ integration.update(url: json['url']) if json['url']
+ rescue JSON::ParserError
+ # If the JSON is invalid we don't want to keep the job around forever,
+ # instead we'll just leave the "url" field to whatever the default value
+ # is.
+ next
+ end
+ end
+end
+```
+
+Next we'll need to adjust our code so we schedule the above migration for newly
+created and updated integrations. We can do this using something along the lines of
+the following:
+
+```ruby
+class Integration < ActiveRecord::Base
+ after_commit :schedule_integration_migration, on: :update
+ after_commit :schedule_integration_migration, on: :create
+
+ def schedule_integration_migration
+ BackgroundMigrationWorker.perform_async('ExtractIntegrationsUrl', [id, id])
+ end
+end
+```
+
+We're using `after_commit` here to ensure the Sidekiq job is not scheduled
+before the transaction completes as doing so can lead to race conditions where
+the changes are not yet visible to the worker.
+
+Next we'll need a post-deployment migration that schedules the migration for
+existing data.
+
+```ruby
+class ScheduleExtractIntegrationsUrl < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ MIGRATION = 'ExtractIntegrationsUrl'
+ DELAY_INTERVAL = 2.minutes
+
+ def up
+ queue_background_migration_jobs_by_range_at_intervals(
+ define_batchable_model('integrations'),
+ MIGRATION,
+ DELAY_INTERVAL)
+ end
+
+ def down
+ end
+end
+```
+
+Once deployed our application will continue using the data as before but at the
+same time will ensure that both existing and new data is migrated.
+
+In the next release we can remove the `after_commit` hooks and related code. We
+will also need to add a post-deployment migration that consumes any remaining
+jobs and manually run on any un-migrated rows. Such a migration would look like
+this:
+
+```ruby
+class ConsumeRemainingExtractIntegrationsUrlJobs < Gitlab::Database::Migration[1.0]
+ disable_ddl_transaction!
+
+ def up
+ # This must be included
+ Gitlab::BackgroundMigration.steal('ExtractIntegrationsUrl')
+
+ # This should be included, but can be skipped - see below
+ define_batchable_model('integrations').where(url: nil).each_batch(of: 50) do |batch|
+ range = batch.pluck('MIN(id)', 'MAX(id)').first
+
+ Gitlab::BackgroundMigration::ExtractIntegrationsUrl.new.perform(*range)
+ end
+ end
+
+ def down
+ end
+end
+```
+
+The final step runs for any un-migrated rows after all of the jobs have been
+processed. This is in case a Sidekiq process running the background migrations
+received SIGKILL, leading to the jobs being lost. (See
+[more reliable Sidekiq queue](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/36791) for more information.)
+
+If the application does not depend on the data being 100% migrated (for
+instance, the data is advisory, and not mission-critical), then this final step
+can be skipped.
+
+This migration will then process any jobs for the ExtractIntegrationsUrl migration
+and continue once all jobs have been processed. Once done you can safely remove
+the `integrations.properties` column.
+
+## Testing
+
+It is required to write tests for:
+
+- The background migrations' scheduling migration.
+- The background migration itself.
+- A cleanup migration.
+
+The `:migration` and `schema: :latest` RSpec tags are automatically set for
+background migration specs.
+See the
+[Testing Rails migrations](../testing_guide/testing_migrations_guide.md#testing-a-non-activerecordmigration-class)
+style guide.
+
+Keep in mind that `before` and `after` RSpec hooks are going
+to migrate you database down and up, which can result in other background
+migrations being called. That means that using `spy` test doubles with
+`have_received` is encouraged, instead of using regular test doubles, because
+your expectations defined in a `it` block can conflict with what is being
+called in RSpec hooks. See [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
+for more details.
+
+## Best practices
+
+1. Make sure to know how much data you're dealing with.
+1. Make sure that background migration jobs are idempotent.
+1. Make sure that tests you write are not false positives.
+1. Make sure that if the data being migrated is critical and cannot be lost, the
+ clean-up migration also checks the final state of the data before completing.
+1. When migrating many columns, make sure it won't generate too many
+ dead tuples in the process (you may need to directly query the number of dead tuples
+ and adjust the scheduling according to this piece of data).
+1. Make sure to discuss the numbers with a database specialist, the migration may add
+ more pressure on DB than you expect (measure on staging,
+ or ask someone to measure on production).
+1. Make sure to know how much time it'll take to run all scheduled migrations.
+1. Provide an estimation section in the description, estimating both the total migration
+ run time and the query times for each background migration job. Explain plans for each query
+ should also be provided.
+
+ For example, assuming a migration that deletes data, include information similar to
+ the following section:
+
+ ```plaintext
+ Background Migration Details:
+
+ 47600 items to delete
+ batch size = 1000
+ 47600 / 1000 = 48 batches
+
+ Estimated times per batch:
+ - 820ms for select statement with 1000 items (see linked explain plan)
+ - 900ms for delete statement with 1000 items (see linked explain plan)
+ Total: ~2 sec per batch
+
+ 2 mins delay per batch (safe for the given total time per batch)
+
+ 48 batches * 2 min per batch = 96 mins to run all the scheduled jobs
+ ```
+
+ The execution time per batch (2 sec in this example) is not included in the calculation
+ for total migration time. The jobs are scheduled 2 minutes apart without knowledge of
+ the execution time.
+
+## Additional tips and strategies
+
+### Nested batching
+
+A strategy to make the migration run faster is to schedule larger batches, and then use `EachBatch`
+within the background migration to perform multiple statements.
+
+The background migration helpers that queue multiple jobs such as
+`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](../iterating_tables_in_batches.md).
+The example above has batches of 1000, where each queued job takes two seconds. If the query has been optimized
+to make the time for the delete statement within the [query performance guidelines](../query_performance.md),
+1000 may be the largest number of records that can be deleted in a reasonable amount of time.
+
+The minimum and most common interval for delaying jobs is two minutes. This results in two seconds
+of work for each two minute job. There's nothing that prevents you from executing multiple delete
+statements in each background migration job.
+
+Looking at the example above, you could alternatively do:
+
+```plaintext
+Background Migration Details:
+
+47600 items to delete
+batch size = 10_000
+47600 / 10_000 = 5 batches
+
+Estimated times per batch:
+- Records are updated in sub-batches of 1000 => 10_000 / 1000 = 10 total updates
+- 820ms for select statement with 1000 items (see linked explain plan)
+- 900ms for delete statement with 1000 items (see linked explain plan)
+Sub-batch total: ~2 sec per sub-batch,
+Total batch time: 2 * 10 = 20 sec per batch
+
+2 mins delay per batch
+
+5 batches * 2 min per batch = 10 mins to run all the scheduled jobs
+```
+
+The batch time of 20 seconds still fits comfortably within the two minute delay, yet the total run
+time is cut by a tenth from around 100 minutes to 10 minutes! When dealing with large background
+migrations, this can cut the total migration time by days.
+
+When batching in this way, it is important to look at query times on the higher end
+of the table or relation being updated. `EachBatch` may generate some queries that become much
+slower when dealing with higher ID ranges.
+
+### Delay time
+
+When looking at the batch execution time versus the delay time, the execution time
+should fit comfortably within the delay time for a few reasons:
+
+- To allow for a variance in query times.
+- To allow autovacuum to catch up after periods of high churn.
+
+Never try to optimize by fully filling the delay window even if you are confident
+the queries themselves have no timing variance.
+
+### Background jobs tracking
+
+NOTE:
+Background migrations with job tracking enabled must call `mark_all_as_succeeded` for its batch, even if no work is needed to be done.
+
+`queue_background_migration_jobs_by_range_at_intervals` can create records for each job that is scheduled to run.
+You can enable this behavior by passing `track_jobs: true`. Each record starts with a `pending` status. Make sure that your worker updates the job status to `succeeded` by calling `Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded` in the `perform` method of your background migration.
+
+```ruby
+# Background migration code
+
+def perform(start_id, end_id)
+ # do work here
+
+ mark_job_as_succeeded(start_id, end_id)
+end
+
+private
+
+# Make sure that the arguments passed here match those passed to the background
+# migration
+def mark_job_as_succeeded(*arguments)
+ Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded(
+ self.class.name.demodulize,
+ arguments
+ )
+end
+```
+
+```ruby
+# Post deployment migration
+MIGRATION = 'YourBackgroundMigrationName'
+DELAY_INTERVAL = 2.minutes.to_i # can be different
+BATCH_SIZE = 10_000 # can be different
+
+disable_ddl_transaction!
+
+def up
+ queue_background_migration_jobs_by_range_at_intervals(
+ define_batchable_model('name_of_the_table_backing_the_model'),
+ MIGRATION,
+ DELAY_INTERVAL,
+ batch_size: BATCH_SIZE,
+ track_jobs: true
+ )
+end
+
+def down
+ # no-op
+end
+```
+
+See [`lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb) for a full example.
+
+#### Rescheduling pending jobs
+
+You can reschedule pending migrations from the `background_migration_jobs` table by creating a post-deployment migration and calling `requeue_background_migration_jobs_by_range_at_intervals` with the migration name and delay interval.
+
+```ruby
+# Post deployment migration
+MIGRATION = 'YourBackgroundMigrationName'
+DELAY_INTERVAL = 2.minutes
+
+disable_ddl_transaction!
+
+def up
+ requeue_background_migration_jobs_by_range_at_intervals(MIGRATION, DELAY_INTERVAL)
+end
+
+def down
+ # no-op
+end
+```
+
+See [`db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb) for a full example.
+
+### Viewing failure error logs
+
+After running a background migration, if any jobs have failed, you can view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694).
+View the production Sidekiq log and filter for:
+
+- `json.class: BackgroundMigrationWorker`
+- `json.job_status: fail`
+- `json.meta.caller_id: <MyBackgroundMigrationSchedulingMigrationClassName>`
+- `json.args: <MyBackgroundMigrationClassName>`
+
+Looking at the `json.error_class`, `json.error_message` and `json.error_backtrace` values may be helpful in understanding why the jobs failed.
+
+Depending on when and how the failure occurred, you may find other helpful information by filtering with `json.class: <MyBackgroundMigrationClassName>`.
diff --git a/doc/development/database/client_side_connection_pool.md b/doc/development/database/client_side_connection_pool.md
index 8316a75ac8d..60c8665df87 100644
--- a/doc/development/database/client_side_connection_pool.md
+++ b/doc/development/database/client_side_connection_pool.md
@@ -1,8 +1,7 @@
---
-type: dev, reference
-stage: none
-group: Development
-info: "See the Technical Writers assigned to Development Guidelines: https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments-to-development-guidelines"
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Client-side connection-pool
diff --git a/doc/development/database/database_lab.md b/doc/development/database/database_lab.md
new file mode 100644
index 00000000000..1c8694b113d
--- /dev/null
+++ b/doc/development/database/database_lab.md
@@ -0,0 +1,101 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Database Lab and Postgres.ai
+
+Internal users at GitLab have access to the Database Lab Engine (DLE) and
+[postgres.ai](https://console.postgres.ai/) for testing performance of database queries
+on replicated production data. Unlike a typical read-only production replica, in the DLE you can
+also create, update, and delete rows. You can also test the performance of
+schema changes, like additional indexes or columns, in an isolated copy of production data.
+
+## Access Database Lab Engine
+
+Access to the DLE is helpful for:
+
+- Database reviewers and maintainers.
+- Engineers who work on merge requests that have large effects on databases.
+
+To access the DLE's services, you can:
+
+- Perform query testing in the `#database_lab` Slack channel, or in the Postgres.ai web console.
+ Employees access both services with their GitLab Google account. Query testing
+ provides `EXPLAIN` (analyze, buffers) plans for queries executed there.
+- Migration testing by triggering a job as a part of a merge request.
+- Direct `psql` access to DLE instead of a production replica. Available to authorized users only.
+ To request `psql` access, file an [access request](https://about.gitlab.com/handbook/business-technology/team-member-enablement/onboarding-access-requests/access-requests/#individual-or-bulk-access-request).
+
+For more assistance, use the `#database` Slack channel.
+
+NOTE:
+If you need only temporary access to a production replica, instead of a Database Lab
+clone, follow the runbook procedure for connecting to the
+[database console with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Database_Console_via_Teleport.md).
+This procedure is similar to [Rails console access with Teleport](https://gitlab.com/gitlab-com/runbooks/-/blob/master/docs/Teleport/Connect_to_Rails_Console_via_Teleport.md#how-to-use-teleport-to-connect-to-rails-console).
+
+### Query testing
+
+You can access Database Lab's query analysis features either:
+
+- In the `#database_lab` Slack channel. Shows everyone's commands and results, but
+ your own commands are still isolated in their own clone.
+- In [the Postgres.ai web console](https://console.postgres.ai/GitLab/joe-instances).
+ Shows only the commands you run.
+
+#### Generate query plans
+
+Query plans are an essential part of the database review process. These plans
+enable us to decide quickly if a given query can be performant on GitLab.com.
+Running the `explain` command generates an `explain` plan and a link to the Postgres.ai
+console with more query analysis. For example, running `EXPLAIN SELECT * FROM application_settings`
+does the following:
+
+1. Runs `explain (analyze, buffers) select * from application_settings;` against a database clone.
+1. Responds with timing and buffer details from the run.
+1. Provides a [detailed, shareable report on the results](https://console.postgres.ai/shared/24d543c9-893b-4ff6-8deb-a8f902f85a53).
+
+#### Making schema changes
+
+Sometimes when testing queries, a contributor may realize that the query needs an index
+or other schema change to make added queries more performant. To test the query, run the `exec` command.
+For example, running this command:
+
+```sql
+exec CREATE INDEX on application_settings USING btree (instance_administration_project_id)
+```
+
+creates the specified index on the table. You can [test queries](#generate-query-plans) leveraging
+the new index. `exec` does not return any results, only the time required to execute the query.
+
+#### Reset the clone
+
+After many changes, such as after a destructive query or an ineffective index,
+you must start over. To reset your designated clone, run `reset`.
+
+### Migration testing
+
+For information on testing migrations, review our
+[database migration testing documentation](database_migration_pipeline.md).
+
+### Access the console with `psql`
+
+Team members with [`psql` access](#access-database-lab-engine), can gain direct access
+to a clone via `psql`. Access to `psql` enables you to see data, not just metadata.
+
+To connect to a clone using `psql`:
+
+1. Create a clone from the [desired instance](https://console.postgres.ai/gitlab/instances/).
+ 1. Provide a **Clone ID**: Something that uniquely identifies your clone, such as `yourname-testing-gitlabissue`.
+ 1. Provide a **Database username** and **Database password**: Connects `psql` to your clone.
+ 1. Select **Enable deletion protection** if you want to preserve your clone. Avoid selecting this option.
+ Clones are removed after 12 hours.
+1. In the **Clone details** page of the Postgres.ai web interface, copy and run
+ the command to start SSH port forwarding for the clone.
+1. In the **Clone details** page of the Postgres.ai web interface, copy and run the `psql` connection string.
+ Use the password provided at setup.
+
+After you connect, use clone like you would any `psql` console in production, but with
+the added benefit and safety of an isolated writeable environment.
diff --git a/doc/development/database/database_reviewer_guidelines.md b/doc/development/database/database_reviewer_guidelines.md
index 9d5e4821c9f..ca9ca36b156 100644
--- a/doc/development/database/database_reviewer_guidelines.md
+++ b/doc/development/database/database_reviewer_guidelines.md
@@ -70,7 +70,7 @@ Finally, you can find various guides in the [Database guides](index.md) page tha
topics and use cases. The most frequently required during database reviewing are the following:
- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations.
-- [Avoiding downtime in migrations](../avoiding_downtime_in_migrations.md).
+- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md).
- [SQL guidelines](../sql.md) for working with SQL queries.
- [Guidelines for JiHu contributions with database migrations](https://about.gitlab.com/handbook/ceo/chief-of-staff-team/jihu-support/jihu-database-change-process.html)
diff --git a/doc/development/database/deleting_migrations.md b/doc/development/database/deleting_migrations.md
new file mode 100644
index 00000000000..be9009f365d
--- /dev/null
+++ b/doc/development/database/deleting_migrations.md
@@ -0,0 +1,39 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Delete existing migrations
+
+When removing existing migrations from the GitLab project, you have to take into account
+the possibility of the migration already been included in past releases or in the current release, and thus already executed on GitLab.com and/or in self-managed instances.
+
+Because of it, it's not possible to delete existing migrations, as that could lead to:
+
+- Schema inconsistency, as changes introduced into the database were not rolled back properly.
+- Leaving a record on the `schema_versions` table, that points out to migration that no longer exists on the codebase.
+
+Instead of deleting we can opt for disabling the migration.
+
+## Pre-requisites to disable a migration
+
+Migrations can be disabled if:
+
+- They caused a timeout or general issue on GitLab.com.
+- They are obsoleted, for example, changes are not necessary due to a feature change.
+- Migration is a data migration only, that is, the migration does not change the database schema.
+
+## How to disable a data migration?
+
+In order to disable a migration, the following steps apply to all types of migrations:
+
+1. Turn the migration into a no-op by removing the code inside `#up`, `#down`
+ or `#perform` methods, and adding `# no-op` comment instead.
+1. Add a comment explaining why the code is gone.
+
+Disabling migrations requires explicit approval of Database Maintainer.
+
+## Examples
+
+- [Disable scheduling of productivity analytics](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/17253)
diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index efc48f72d00..0363d13ed4c 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -23,14 +23,15 @@ info: To determine the technical writer assigned to the Stage/Group associated w
## Migrations
-- [Avoiding downtime in migrations](../avoiding_downtime_in_migrations.md)
+- [Migrations for multiple databases](migrations_for_multiple_databases.md)
+- [Avoiding downtime in migrations](avoiding_downtime_in_migrations.md)
- [SQL guidelines](../sql.md) for working with SQL queries
- [Migrations style guide](../migration_style_guide.md) for creating safe SQL migrations
- [Testing Rails migrations](../testing_guide/testing_migrations_guide.md) guide
-- [Post deployment migrations](../post_deployment_migrations.md)
-- [Background migrations](../background_migrations.md)
+- [Post deployment migrations](post_deployment_migrations.md)
+- [Background migrations](background_migrations.md)
- [Swapping tables](../swapping_tables.md)
-- [Deleting migrations](../deleting_migrations.md)
+- [Deleting migrations](deleting_migrations.md)
- [Partitioning tables](table_partitioning.md)
## Debugging
@@ -64,6 +65,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
- [Pagination guidelines](pagination_guidelines.md)
- [Pagination performance guidelines](pagination_performance_guidelines.md)
- [Efficient `IN` operator queries](efficient_in_operator_queries.md)
+- [Data layout and access patterns](layout_and_access_patterns.md)
## Case studies
diff --git a/doc/development/database/keyset_pagination.md b/doc/development/database/keyset_pagination.md
index 4f0b353a37f..88928feb927 100644
--- a/doc/development/database/keyset_pagination.md
+++ b/doc/development/database/keyset_pagination.md
@@ -166,7 +166,7 @@ These order objects can be defined in the model classes as normal ActiveRecord s
Consider the following scope:
```ruby
-scope = Issue.where(project_id: 10).order(Gitlab::Database.nulls_last_order('relative_position', 'DESC'))
+scope = Issue.where(project_id: 10).order(Issue.arel_table[:relative_position].desc.nulls_last)
# SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 10 ORDER BY relative_position DESC NULLS LAST
scope.keyset_paginate # raises: Gitlab::Pagination::Keyset::UnsupportedScopeOrder: The order on the scope does not support keyset pagination
@@ -190,8 +190,8 @@ order = Gitlab::Pagination::Keyset::Order.build([
Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
attribute_name: 'relative_position',
column_expression: Issue.arel_table[:relative_position],
- order_expression: Gitlab::Database.nulls_last_order('relative_position', 'DESC'),
- reversed_order_expression: Gitlab::Database.nulls_first_order('relative_position', 'ASC'),
+ order_expression: Issue.arel_table[:relative_position].desc.nulls_last,
+ reversed_order_expression: Issue.arel_table[:relative_position].asc.nulls_first,
nullable: :nulls_last,
order_direction: :desc,
distinct: false
diff --git a/doc/development/database/layout_and_access_patterns.md b/doc/development/database/layout_and_access_patterns.md
new file mode 100644
index 00000000000..a3e2fefb2a3
--- /dev/null
+++ b/doc/development/database/layout_and_access_patterns.md
@@ -0,0 +1,61 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Best practices for data layout and access patterns
+
+Certain patterns of data access, and especially data updates, can exacerbate strain
+on the database. Avoid them if possible.
+
+This document lists some patterns to avoid, with recommendations for alternatives.
+
+## High-frequency updates, especially to the same row
+
+Avoid single database rows that are updated by many transactions at the same time.
+
+- If many processes attempt to update the same row simultaneously, they queue up
+ as each transaction locks the row for writing. As this can significantly increase
+ transaction timings, the Rails connection pools can saturate, leading to
+ application-wide downtime.
+- For each row update, PostgreSQL inserts a new row version and deletes the old one.
+ In high-traffic scenarios, this approach can cause vacuum and WAL (write-ahead log)
+ pressure, reducing database performance.
+
+This pattern often happens when an aggregate is too expensive to compute for each
+request, so a running tally is kept in the database. If you need such an aggregate,
+consider keeping a running total in a single row, plus a small working set of
+recently added data, such as individual increments:
+
+- When introducing new data, add it to the working set. These inserts do not
+ cause lock contention.
+- When calculating the aggregate, combine the running total with a live aggregate
+ from the working set, providing an up-to-date result.
+- Add a periodic job that incorporates the working set into the running total and
+ clears it in a transaction, bounding the amount of work needed by a reader.
+
+## Wide tables
+
+PostgreSQL organizes rows into 8 KB pages, and operates on one page at a time.
+By minimizing the width of rows in a table, we improve the following:
+
+- Sequential and bitmap index scan performance, because fewer pages must be
+ scanned if each contains more rows.
+- Vacuum performance, because vacuum can process more rows in each page.
+- Update performance, because during a (non-HOT) update, each index must be
+ updated for every row update.
+
+Mitigating wide tables is one part of the database team's
+[100 GB table initiative](../../architecture/blueprints/database_scaling/size-limits.md),
+as wider tables can fit fewer rows in 100 GB.
+
+When adding columns to a table, consider if you intend to access the data in the
+new columns by itself, in a one-to-one relationship with the other columns of the
+table. If so, the new columns could be a good candidate for splitting to a new table.
+
+Several tables have already been split in this way. For example:
+
+- `search_data` is split from `issues`.
+- `project_pages_metadata` is split from `projects`.
+- `merge_request_diff_details` is split from `merge_request_diffs`
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index 17a825b4812..2bcdc91202a 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -95,27 +95,27 @@ Created database 'gitlabhq_test_ee'
Created database 'gitlabhq_geo_test_ee'
Showing cross-schema foreign keys (20):
- ID | HAS_LFK | FROM | TO | COLUMN | ON_DELETE
- 0 | N | ci_builds | projects | project_id | cascade
- 1 | N | ci_job_artifacts | projects | project_id | cascade
- 2 | N | ci_pipelines | projects | project_id | cascade
- 3 | Y | ci_pipelines | merge_requests | merge_request_id | cascade
- 4 | N | external_pull_requests | projects | project_id | cascade
- 5 | N | ci_sources_pipelines | projects | project_id | cascade
- 6 | N | ci_stages | projects | project_id | cascade
- 7 | N | ci_pipeline_schedules | projects | project_id | cascade
- 8 | N | ci_runner_projects | projects | project_id | cascade
- 9 | Y | dast_site_profiles_pipelines | ci_pipelines | ci_pipeline_id | cascade
- 10 | Y | vulnerability_feedback | ci_pipelines | pipeline_id | nullify
- 11 | N | ci_variables | projects | project_id | cascade
- 12 | N | ci_refs | projects | project_id | cascade
- 13 | N | ci_builds_metadata | projects | project_id | cascade
- 14 | N | ci_subscriptions_projects | projects | downstream_project_id | cascade
- 15 | N | ci_subscriptions_projects | projects | upstream_project_id | cascade
- 16 | N | ci_sources_projects | projects | source_project_id | cascade
- 17 | N | ci_job_token_project_scope_links | projects | source_project_id | cascade
- 18 | N | ci_job_token_project_scope_links | projects | target_project_id | cascade
- 19 | N | ci_project_monthly_usages | projects | project_id | cascade
+ ID | HAS_LFK | FROM | TO | COLUMN | ON_DELETE
+ 0 | N | ci_builds | projects | project_id | cascade
+ 1 | N | ci_job_artifacts | projects | project_id | cascade
+ 2 | N | ci_pipelines | projects | project_id | cascade
+ 3 | Y | ci_pipelines | merge_requests | merge_request_id | cascade
+ 4 | N | external_pull_requests | projects | project_id | cascade
+ 5 | N | ci_sources_pipelines | projects | project_id | cascade
+ 6 | N | ci_stages | projects | project_id | cascade
+ 7 | N | ci_pipeline_schedules | projects | project_id | cascade
+ 8 | N | ci_runner_projects | projects | project_id | cascade
+ 9 | Y | dast_site_profiles_pipelines | ci_pipelines | ci_pipeline_id | cascade
+ 10 | Y | vulnerability_feedback | ci_pipelines | pipeline_id | nullify
+ 11 | N | ci_variables | projects | project_id | cascade
+ 12 | N | ci_refs | projects | project_id | cascade
+ 13 | N | ci_builds_metadata | projects | project_id | cascade
+ 14 | N | ci_subscriptions_projects | projects | downstream_project_id | cascade
+ 15 | N | ci_subscriptions_projects | projects | upstream_project_id | cascade
+ 16 | N | ci_sources_projects | projects | source_project_id | cascade
+ 17 | N | ci_job_token_project_scope_links | projects | source_project_id | cascade
+ 18 | N | ci_job_token_project_scope_links | projects | target_project_id | cascade
+ 19 | N | ci_project_monthly_usages | projects | project_id | cascade
To match FK write one or many filters to match against FROM/TO/COLUMN:
- scripts/decomposition/generate-loose-foreign-key <filter(s)...>
@@ -191,7 +191,7 @@ ci_pipelines:
### Track record changes
To know about deletions in the `projects` table, configure a `DELETE` trigger
-using a [post-deployment migration](../post_deployment_migrations.md). The
+using a [post-deployment migration](post_deployment_migrations.md). The
trigger needs to be configured only once. If the model already has at least one
`loose_foreign_key` definition, then this step can be skipped:
@@ -226,7 +226,7 @@ ON DELETE CASCADE;
The migration must run after the `DELETE` trigger is installed and the loose
foreign key definition is deployed. As such, it must be a [post-deployment
-migration](../post_deployment_migrations.md) dated after the migration for the
+migration](post_deployment_migrations.md) dated after the migration for the
trigger. If the foreign key is deleted earlier, there is a good chance of
introducing data inconsistency which needs manual cleanup:
@@ -480,3 +480,380 @@ it executes `occurrence.pipeline.created_at`.
When looping through the vulnerability occurrences in the Sidekiq worker, we
could try to load the corresponding pipeline and choose to skip processing that
occurrence if pipeline is not found.
+
+## Architecture
+
+The loose foreign keys feature is implemented within the `LooseForeignKeys` Ruby namespace. The
+code is isolated from the core application code and theoretically, it could be a standalone library.
+
+The feature is invoked solely in the [`LooseForeignKeys::CleanupWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/loose_foreign_keys/cleanup_worker.rb) worker class. The worker is scheduled via a
+cron job where the schedule depends on the configuration of the GitLab instance.
+
+- Non-decomposed GitLab (1 database): invoked every minute.
+- Decomposed GitLab (2 databases, CI and Main): invoked every minute, cleaning up one database
+at a time. For example, the cleanup worker for the main database runs every two minutes.
+
+To avoid lock contention and the processing of the same database rows, the worker does not run
+parallel. This behavior is ensured with a Redis lock.
+
+**Record cleanup procedure:**
+
+1. Acquire the Redis lock.
+1. Determine which database to clean up.
+1. Collect all database tables where the deletions are tracked (parent tables).
+ - This is achieved by reading the `config/gitlab_loose_foreign_keys.yml` file.
+ - A table is considered "tracked" when a loose foreign key definition exists for the table and
+ the `DELETE` trigger is installed.
+1. Cycle through the tables with an infinite loop.
+1. For each table, load a batch of deleted parent records to clean up.
+1. Depending on the YAML configuration, build `DELETE` or `UPDATE` (nullify) queries for the
+referenced child tables.
+1. Invoke the queries.
+1. Repeat until all child records are cleaned up or the maximum limit is reached.
+1. Remove the deleted parent records when all child records are cleaned up.
+
+### Database structure
+
+The feature relies on triggers installed on the parent tables. When a parent record is deleted,
+the trigger will automatically insert a new record into the `loose_foreign_keys_deleted_records`
+database table.
+
+The inserted record will store the following information about the deleted record:
+
+- `fully_qualified_table_name`: name of the database table where the record was located.
+- `primary_key_value`: the ID of the record, the value will be present in the child tables as
+the foreign key value. At the moment, composite primary keys are not supported, the parent table
+must have an `id` column.
+- `status`: defaults to pending, represents the status of the cleanup process.
+- `consume_after`: defaults to the current time.
+- `cleanup_attempts`: defaults to 0. The number of times the worker tried to clean up this record.
+A non-zero number would mean that this record has many child records and cleaning it up requires
+several runs.
+
+#### Database decomposition
+
+The `loose_foreign_keys_deleted_records` table will exist on both database servers (Ci and Main)
+after the [database decomposition](https://gitlab.com/groups/gitlab-org/-/epics/6168). The worker
+ill determine which parent tables belong to which database by reading the
+`lib/gitlab/database/gitlab_schemas.yml` YAML file.
+
+Example:
+
+- Main database tables
+ - `projects`
+ - `namespaces`
+ - `merge_requests`
+- Ci database tables
+ - `ci_builds`
+ - `ci_pipelines`
+
+When the worker is invoked for the Ci database, the worker will load deleted records only from the
+`ci_builds` and `ci_pipelines` tables. During the cleanup process, `DELETE` and `UPDATE` queries
+will mostly run on tables located in the Main database. In this example, one `UPDATE` query will
+nullify the `merge_requests.head_pipeline_id` column.
+
+#### Database partitioning
+
+Due to the large volume of inserts the database table receives daily, a special partitioning
+strategy was implemented to address data bloat concerns. Originally, the
+[time-decay](https://about.gitlab.com/company/team/structure/working-groups/database-scalability/time-decay.html)
+strategy was considered for the feature but due to the large data volume we decided to implement a
+new strategy.
+
+A deleted record is considered fully processed when all its direct children records have been
+cleaned up. When this happens, the loose foreign key worker will update the `status` column of
+the deleted record. After this step, the record is no longer needed.
+
+The sliding partitioning strategy provides an efficient way of cleaning up old, unused data by
+adding a new database partition and removing the old one when certain conditions are met.
+The `loose_foreign_keys_deleted_records` database table is list partitioned where most of the
+time there is only one partition attached to the table.
+
+```sql
+ Partitioned table "public.loose_foreign_keys_deleted_records"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+-------------
+ id | bigint | | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain | |
+ partition | bigint | | not null | 84 | plain | |
+ primary_key_value | bigint | | not null | | plain | |
+ status | smallint | | not null | 1 | plain | |
+ created_at | timestamp with time zone | | not null | now() | plain | |
+ fully_qualified_table_name | text | | not null | | extended | |
+ consume_after | timestamp with time zone | | | now() | plain | |
+ cleanup_attempts | smallint | | | 0 | plain | |
+Partition key: LIST (partition)
+Indexes:
+ "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id)
+ "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1
+Check constraints:
+ "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150)
+Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_84 FOR VALUES IN ('84')
+```
+
+The `partition` column controls the insert direction, the `partition` value determines which
+partition will get the deleted rows inserted via the trigger. Notice that the default value of
+the `partition` table matches with the value of the list partition (84). In `INSERT` query
+within the trigger thevalue of the `partition` is omitted, the trigger always relies on the
+default value of the column.
+
+Example `INSERT` query for the trigger:
+
+```sql
+INSERT INTO loose_foreign_keys_deleted_records
+(fully_qualified_table_name, primary_key_value)
+SELECT TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, old_table.id FROM old_table;
+```
+
+The partition "sliding" process is controlled by two, regularly executed callbacks. These
+callbackes are defined within the `LooseForeignKeys::DeletedRecord` model.
+
+The `next_partition_if` callback controls when to create a new partition. A new partition will
+be created when the current partition has at least one record older than 24 hours. A new partition
+is added by the [`PartitionManager`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/partitioning/partition_manager.rb)
+using the following steps:
+
+1. Create a new partition, where the `VALUE` for the partition is `CURRENT_PARTITION + 1`.
+1. Update the default value of the `partition` column to `CURRENT_PARTITION + 1`.
+
+With these steps, new `INSERT`-s via the triggers will end up in the new partition. At this point,
+the database table has two partitions.
+
+The `detach_partition_if` callback determines if the old partitions can be detached from the table.
+A partition is detachable if there are no pending (unprocessed) records in the partition
+(`status = 1`). The detached partitions will be available for some time, you can see the list
+detached partitions in the `detached_partitions` table:
+
+```sql
+select * from detached_partitions;
+```
+
+#### Cleanup queries
+
+The `LooseForeignKeys::CleanupWorker` has its database query builder which depends on `Arel`.
+The feature doesn't reference any application-specific `ActiveRecord` models to avoid unexpected
+side effects. The database queries are batched, which means that several parent records are being
+cleaned up at the same time.
+
+Example `DELETE` query:
+
+```sql
+DELETE
+FROM "merge_request_metrics"
+WHERE ("merge_request_metrics"."id") IN
+ (SELECT "merge_request_metrics"."id"
+ FROM "merge_request_metrics"
+ WHERE "merge_request_metrics"."pipeline_id" IN (1, 2, 10, 20)
+ LIMIT 1000 FOR UPDATE SKIP LOCKED)
+```
+
+The primary key values of the parent records are 1, 2, 10, and 20.
+
+Example `UPDATE` (nullify) query:
+
+```sql
+UPDATE "merge_requests"
+SET "head_pipeline_id" = NULL
+WHERE ("merge_requests"."id") IN
+ (SELECT "merge_requests"."id"
+ FROM "merge_requests"
+ WHERE "merge_requests"."head_pipeline_id" IN (3, 4, 30, 40)
+ LIMIT 500 FOR UPDATE SKIP LOCKED)
+```
+
+These queries are batched, which means that in many cases, several invocations are needed to clean
+up all associated child records.
+
+The batching is implemented with loops, the processing will stop when all associated child records
+are cleaned up or the limit is reached.
+
+```ruby
+loop do
+ modification_count = process_batch_with_skip_locked
+
+ break if modification_count == 0 || over_limit?
+end
+
+loop do
+ modification_count = process_batch
+
+ break if modification_count == 0 || over_limit?
+end
+```
+
+The loop-based batch processing is preferred over `EachBatch` for the following reasons:
+
+- The records in the batch are modified, so the next batch will contain different records.
+- There is always an index on the foreign key column however, the column is usually not unique.
+`EachBatch` requires a unique column for the iteration.
+- The record order doesn't matter for the cleanup.
+
+Notice that we have two loops. The initial loop will process records with the `SKIP LOCKED` clause.
+The query will skip rows that are locked by other application processes. This will ensure that the
+cleanup worker will less likely to become blocked. The second loop will execute the database
+queries without `SKIP LOCKED` to ensure that all records have been processed.
+
+#### Processing limits
+
+A constant, large volume of record updates or deletions can cause incidents and affect the
+availability of GitLab:
+
+- Increased table bloat.
+- Increased number of pending WAL files.
+- Busy tables, difficulty when acquiring locks.
+
+To mitigate these issues, several limits are applied when the worker runs.
+
+- Each query has `LIMIT`, a query cannot process an unbounded number of rows.
+- The maximum number of record deletions and record updates is limited.
+- The maximum runtime (30 seconds) for the database queries is limited.
+
+The limit rules are implemented in the `LooseForeignKeys::ModificationTracker` class. When one of
+the limits (record modification count, time limit) is reached the processing is stopped
+immediately. After some time, the next scheduled worker will continue the cleanup process.
+
+#### Performance characteristics
+
+The database trigger on the parent tables will **decrease** the record deletion speed. Each
+statement that removes rows from the parent table will invoke the trigger to insert records
+into the `loose_foreign_keys_deleted_records` table.
+
+The queries within the cleanup worker are fairly efficient index scans, with limits in place
+they're unlikely to affect other parts of the application.
+
+The database queries are not running in transaction, when an error happens for example a statement
+timeout or a worker crash, the next job will continue the processing.
+
+## Troubleshooting
+
+### Accumulation of deleted records
+
+There can be cases where the workers need to process an unusually large amount of data. This can
+happen under normal usage, for example when a large project or group is deleted. In this scenario,
+there can be several million rows to be deleted or nullified. Due to the limits enforced by the
+worker, processing this data will take some time.
+
+When cleaning up "heavy-hitters", the feature ensures fair processing by rescheduling larger
+batches for later. This gives time for other deleted records to be processed.
+
+For example, a project with millions of `ci_builds` records is deleted. The `ci_builds` records
+will be deleted by the loose foreign keys feature.
+
+1. The cleanup worker is scheduled and picks up a batch of deleted `projects` records. The large
+project is part of the batch.
+1. Deletion of the orphaned `ci_builds` rows has started.
+1. The time limit is reached, but the cleanup is not complete.
+1. The `cleanup_attempts` column is incremented for the deleted records.
+1. Go to step 1. The next cleanup worker continues the cleanup.
+1. When the `cleanup_attempts` reaches 3, the batch is re-scheduled 10 minutes later by updating
+the `consume_after` column.
+1. The next cleanup worker will process a different batch.
+
+We have Prometheus metrics in place to monitor the deleted record cleanup:
+
+- `loose_foreign_key_processed_deleted_records`: Number of processed deleted records. When large
+cleanup happens, this number would decrease.
+- `loose_foreign_key_incremented_deleted_records`: Number of deleted records which were not
+finished processing. The `cleanup_attempts` column was incremented.
+- `loose_foreign_key_rescheduled_deleted_records`: Number of deleted records that had to be
+rescheduled at a later time after 3 cleanup attempts.
+
+Example Thanos query:
+
+```plaintext
+loose_foreign_key_rescheduled_deleted_records{env="gprd", table="ci_runners"}
+```
+
+Another way to look at the situation is by running a database query. This query gives the exact
+counts of the unprocessed records:
+
+```sql
+SELECT partition, fully_qualified_table_name, count(*)
+FROM loose_foreign_keys_deleted_records
+WHERE
+status = 1
+GROUP BY 1, 2;
+```
+
+Example output:
+
+```sql
+ partition | fully_qualified_table_name | count
+-----------+----------------------------+-------
+ 87 | public.ci_builds | 874
+ 87 | public.ci_job_artifacts | 6658
+ 87 | public.ci_pipelines | 102
+ 87 | public.ci_runners | 111
+ 87 | public.merge_requests | 255
+ 87 | public.namespaces | 25
+ 87 | public.projects | 6
+```
+
+The query includes the partition number which can be useful to detect if the cleanup process is
+significantly lagging behind. When multiple different partition values are present in the list
+that means the cleanup of some deleted records didn't finish in several days (1 new partition
+is added every day).
+
+Steps to diagnose the problem:
+
+- Check which records are accumulating.
+- Try to get an estimate of the number of remaining records.
+- Looking into the worker performance stats (Kibana or Thanos).
+
+Possible solutions:
+
+- Short-term: increase the batch sizes.
+- Long-term: invoke the worker more frequently. Parallelize the worker
+
+For a one-time fix, we can run the cleanup worker several times from the rails console. The worker
+can run parallelly however, this can introduce lock contention and it could increase the worker
+runtime.
+
+```ruby
+LooseForeignKeys::CleanupWorker.new.perform
+```
+
+When the cleanup is done, the older partitions will be automatically detached by the
+`PartitionManager`.
+
+### PartitionManager bug
+
+NOTE:
+This issue happened in the past on Staging and it has been mitigated.
+
+When adding a new partition, the default value of the `partition` column is also updated. This is
+a schema change that is executed in the same transaction as the new partition creation. It's highly
+unlikely that the `partition` column goes outdated.
+
+However, if this happens then this can cause application-wide incidents because the `partition`
+value points to a partition that doesn't exist. Symptom: deletion of records from tables where the
+`DELETE` trigger is installed fails.
+
+```sql
+\d+ loose_foreign_keys_deleted_records;
+
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+----------------------------+--------------------------+-----------+----------+----------------------------------------------------------------+----------+--------------+-------------
+ id | bigint | | not null | nextval('loose_foreign_keys_deleted_records_id_seq'::regclass) | plain | |
+ partition | bigint | | not null | 4 | plain | |
+ primary_key_value | bigint | | not null | | plain | |
+ status | smallint | | not null | 1 | plain | |
+ created_at | timestamp with time zone | | not null | now() | plain | |
+ fully_qualified_table_name | text | | not null | | extended | |
+ consume_after | timestamp with time zone | | | now() | plain | |
+ cleanup_attempts | smallint | | | 0 | plain | |
+Partition key: LIST (partition)
+Indexes:
+ "loose_foreign_keys_deleted_records_pkey" PRIMARY KEY, btree (partition, id)
+ "index_loose_foreign_keys_deleted_records_for_partitioned_query" btree (partition, fully_qualified_table_name, consume_after, id) WHERE status = 1
+Check constraints:
+ "check_1a541f3235" CHECK (char_length(fully_qualified_table_name) <= 150)
+Partitions: gitlab_partitions_dynamic.loose_foreign_keys_deleted_records_3 FOR VALUES IN ('3')
+```
+
+Check the default value of the `partition` column and compare it with the available partitions
+(4 vs 3). The partition with the value of 4 does not exist. To mitigate the problem an emergency
+schema change is required:
+
+```sql
+ALTER TABLE loose_foreign_keys_deleted_records ALTER COLUMN partition SET DEFAULT 3;
+```
diff --git a/doc/development/database/migrations_for_multiple_databases.md b/doc/development/database/migrations_for_multiple_databases.md
new file mode 100644
index 00000000000..0ec4612e985
--- /dev/null
+++ b/doc/development/database/migrations_for_multiple_databases.md
@@ -0,0 +1,390 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Migrations for Multiple databases
+
+> Support for describing migration purposes was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/73756) in GitLab 14.8.
+
+This document describes how to properly write database migrations
+for [the decomposed GitLab application using multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
+
+Learn more about general multiple databases support in a [separate document](multiple_databases.md).
+
+WARNING:
+If you experience any issues using `Gitlab::Database::Migration[2.0]`,
+you can temporarily revert back to the previous behavior by changing the version to `Gitlab::Database::Migration[1.0]`.
+Please report any issues with `Gitlab::Database::Migration[2.0]` in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/358430).
+
+The design for multiple databases (except for the Geo database) assumes
+that all decomposed databases have **the same structure** (for example, schema), but **the data is different** in each database. This means that some tables do not contain data on each database.
+
+## Operations
+
+Depending on the used constructs, we can classify migrations to be either:
+
+1. Modifying structure ([DDL - Data Definition Language](https://www.postgresql.org/docs/current/ddl.html)) (for example, `ALTER TABLE`).
+1. Modifying data ([DML - Data Manipulation Language](https://www.postgresql.org/docs/current/dml.html)) (for example, `UPDATE`).
+1. Performing [other queries](https://www.postgresql.org/docs/current/queries.html) (for example, `SELECT`) that are treated as **DML** for the purposes of our migrations.
+
+**The usage of `Gitlab::Database::Migration[2.0]` requires migrations to always be of a single purpose**.
+Migrations cannot mix **DDL** and **DML** changes as the application requires the structure
+(as described by `db/structure.sql`) to be exactly the same across all decomposed databases.
+
+### Data Definition Language (DDL)
+
+The DDL migrations are all migrations that:
+
+1. Create or drop a table (for example, `create_table`).
+1. Add or remove an index (for example, `add_index`, `add_index_concurrently`).
+1. Add or remove a foreign key (for example `add_foreign_key`, `add_foreign_key_concurrently`).
+1. Add or remove a column with or without a default value (for example, `add_column`).
+1. Create or drop trigger functions (for example, `create_trigger_function`).
+1. Attach or detach triggers from tables (for example, `track_record_deletions`, `untrack_record_deletions`).
+1. Prepare or not async indexes (for example, `prepare_async_index`, `unprepare_async_index_by_name`).
+
+As such DDL migrations **CANNOT**:
+
+1. Read or modify data in any form, via SQL statements or ActiveRecord models.
+1. Update column values (for example, `update_column_in_batches`).
+1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
+1. Read the state of feature flags since they are stored in `main:` (a `features` and `feature_gates`).
+1. Read application settings (as settings are stored in `main:`).
+
+As the majority of migrations in the GitLab codebase are of the DDL-type,
+this is also the default mode of operation and requires no further changes
+to the migrations files.
+
+#### Example: perform DDL on all databases
+
+Example migration adding a concurrent index that is treated as change of the structure (DDL)
+that is executed on all configured databases.
+
+```ruby
+class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
+ end
+
+ def down
+ remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
+ end
+end
+```
+
+### Data Manipulation Language (DML)
+
+The DML migrations are all migrations that:
+
+1. Read data via SQL statements (for example, `SELECT * FROM projects WHERE id=1`).
+1. Read data via ActiveRecord models (for example, `User < MigrationRecord`).
+1. Create, update or delete data via ActiveRecord models (for example, `User.create!(...)`).
+1. Create, update or delete data via SQL statements (for example, `DELETE FROM projects WHERE id=1`).
+1. Update columns in batches (for example, `update_column_in_batches(:projects, :archived, true)`).
+1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
+1. Access application settings (for example, `ApplicationSetting.last` if run for `main:` database).
+1. Read and modify feature flags if run for the `main:` database.
+
+The DML migrations **CANNOT**:
+
+1. Make any changes to DDL since this breaks the rule of keeping `structure.sql` coherent across
+ all decomposed databases.
+1. **Read data from another database**.
+
+To indicate the `DML` migration type, a migration must use the `restrict_gitlab_migration gitlab_schema:`
+syntax in a migration class. This marks the given migration as DML and restricts access to it.
+
+#### Example: perform DML only in context of the database containing the given `gitlab_schema`
+
+Example migration updating `archived` column of `projects` that is executed
+only for the database containing `gitlab_main` schema.
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+#### Example: usage of `ActiveRecord` classes
+
+A migration using `ActiveRecord` class to perform data manipulation
+must use the `MigrationRecord` class. This class is guaranteed to provide
+a correct connection in a context of a given migration.
+
+Underneath the `MigrationRecord == ActiveRecord::Base`, as once the `db:migrate`
+runs, it switches the active connection of `ActiveRecord::Base.establish_connection :ci`.
+To avoid confusion to using the `ActiveRecord::Base`, `MigrationRecord` is required.
+
+This implies that DML migrations are forbidden to read data from other
+databases. For example, running migration in context of `ci:` and reading feature flags
+from `main:`, as no established connection to another database is present.
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ class Project < MigrationRecord
+ end
+
+ def up
+ Project.where(archived: false).each_batch of |batch|
+ batch.update_all(archived: true)
+ end
+ end
+
+ def down
+ end
+end
+```
+
+### The special purpose of `gitlab_shared`
+
+As described in [gitlab_schema](multiple_databases.md#the-special-purpose-of-gitlab_shared),
+the `gitlab_shared` tables are allowed to contain data across all databases. This implies
+that such migrations should run across all databases to modify structure (DDL) or modify data (DML).
+
+As such migrations accessing `gitlab_shared` do not need to use `restrict_gitlab_migration gitlab_schema:`,
+migrations without restriction run across all databases and are allowed to modify data on each of them.
+If the `restrict_gitlab_migration gitlab_schema:` is specified, the `DML` migration
+runs only in a context of a database containing the given `gitlab_schema`.
+
+#### Example: run DML `gitlab_shared` migration on all databases
+
+Example migration updating `loose_foreign_keys_deleted_records` table
+that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
+
+This migration is executed across all configured databases.
+
+```ruby
+class DeleteAllLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ def up
+ execute("DELETE FROM loose_foreign_keys_deleted_records")
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+#### Example: run DML `gitlab_shared` only on the database containing the given `gitlab_schema`
+
+Example migration updating `loose_foreign_keys_deleted_records` table
+that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
+
+This migration since it configures restriction on `gitlab_ci` is executed only
+in context of database containing `gitlab_ci` schema.
+
+```ruby
+class DeleteCiBuildsLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ execute("DELETE FROM loose_foreign_keys_deleted_records WHERE fully_qualified_table_name='ci_builds'")
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+### The behavior of skipping migrations
+
+The only migrations that are skipped are the ones performing **DML** changes.
+The **DDL** migrations are **always and unconditionally** executed.
+
+The implemented [solution](https://gitlab.com/gitlab-org/gitlab/-/issues/355014#solution-2-use-database_tasks)
+uses the `database_tasks:` as a way to indicate which additional database configurations
+(in `config/database.yml`) share the same primary database. The database configurations
+marked with `database_tasks: false` are exempt from executing `db:migrate` for those
+database configurations.
+
+If database configurations do not share databases (all do have `database_tasks: true`),
+each migration runs for every database configuration:
+
+1. The DDL migration applies all structure changes on all databases.
+1. The DML migration runs only in the context of a database containing the given `gitlab_schema:`.
+1. If the DML migration is not eligible to run, it is skipped. It's still
+ marked as executed in `schema_migrations`. While running `db:migrate`, the skipped
+ migration outputs `Current migration is skipped since it modifies 'gitlab_ci' which is outside of 'gitlab_main, gitlab_shared`.
+
+To prevent loss of migrations if the `database_tasks: false` is configured, a dedicated
+Rake task is used [`gitlab:db:validate_config`](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83118).
+The `gitlab:db:validate_config` validates the correctness of `database_tasks:` by checking database identifiers
+of each underlying database configuration. The ones that share the database are required to have
+the `database_tasks: false` set. `gitlab:db:validate_config` always runs before `db:migrate`.
+
+## Validation
+
+Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze
+each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema).
+The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas
+managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`).
+
+The `Gitlab::Database::Migration[2.0]` includes `Gitlab::Database::MigrationHelpers::RestrictGitlabSchema`
+which extends the `#migrate` method. For the duration of a migration a dedicated query analyzer
+is installed `Gitlab::Database::QueryAnalyzers::RestrictAllowedSchemas` that accepts
+a list of allowed schemas as defined by `restrict_gitlab_migration:`. If the executed query
+is outside of allowed schemas, it raises an exception.
+
+## Exceptions
+
+Depending on misuse or lack of `restrict_gitlab_migration` various exceptions can be raised
+as part of the migration run and prevent the migration from being completed.
+
+### Exception 1: migration running in DDL mode does DML select
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # Missing:
+ # restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+```plaintext
+Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode
+Modifying of 'projects' (gitlab_main) with 'SELECT * FROM projects...
+```
+
+The current migration do not use `restrict_gitlab_migration`. The lack indicates a migration
+running in **DDL** mode, but the executed payload appears to be reading data from `projects`.
+
+**The solution** is to add `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
+
+### Exception 2: migration running in DML mode changes the structure
+
+```ruby
+class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # restrict_gitlab_migration if defined indicates DML, it should be removed
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
+ end
+
+ def down
+ remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
+ end
+end
+```
+
+```plaintext
+DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode.
+Modifying of 'merge_request_reviewers' with 'CREATE INDEX...
+```
+
+The current migration do use `restrict_gitlab_migration`. The presence indicates **DML** mode,
+but the executed payload appears to be doing structure changes (DDL).
+
+**The solution** is to remove `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
+
+### Exception 3: migration running in DML mode accesses data from a table in another schema
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # Since it modifies `projects` it should use `gitlab_main`
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+```plaintext
+Select/DML queries (SELECT/UPDATE/DELETE) do access 'projects' (gitlab_main) " \
+which is outside of list of allowed schemas: 'gitlab_ci'
+```
+
+The current migration do restrict the migration to `gitlab_ci`, but appears to modify
+data in `gitlab_main`.
+
+**The solution** is to change `restrict_gitlab_migration gitlab_schema: :gitlab_ci`.
+
+### Exception 4: mixing DDL and DML mode
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # This migration is invalid regardless of specification
+ # as it cannot modify structure and data at the same time
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: 'index_on_merge_request_reviewers'
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+The migrations mixing **DDL** and **DML** depending on ordering of operations raises
+one of the prior exceptions.
+
+## Upcoming changes on multiple database migrations
+
+The `restrict_gitlab_migration` using `gitlab_schema:` is considered as a first iteration
+of this feature for running migrations selectively depending on a context. It is possible
+to add additional restrictions to DML-only migrations (as the structure coherency is likely
+to stay as-is until further notice) to restrict when they run.
+
+A Potential extension is to limit running DML migration only to specific environments:
+
+```ruby
+restrict_gitlab_migration gitlab_schema: :gitlab_main, gitlab_env: :gitlab_com
+```
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index c9bbf73be55..3b1b06b557c 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -9,141 +9,86 @@ info: To determine the technical writer assigned to the Stage/Group associated w
To scale GitLab, the we are
[decomposing the GitLab application database into multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
-## CI/CD Database
+## GitLab Schema
-> Support for configuring the GitLab Rails application to use a distinct
-database for CI/CD tables was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64289)
-in GitLab 14.1. This feature is still under development, and is not ready for production use.
+For properly discovering allowed patterns between different databases
+the GitLab application implements the `lib/gitlab/database/gitlab_schemas.yml` YAML file.
-### Development setup
+This file provides a virtual classification of tables into a `gitlab_schema`
+which conceptually is similar to [PostgreSQL Schema](https://www.postgresql.org/docs/current/ddl-schemas.html).
+We decided as part of [using database schemas to better isolated CI decomposed features](https://gitlab.com/gitlab-org/gitlab/-/issues/333415)
+that we cannot use PostgreSQL schema due to complex migration procedures. Instead we implemented
+the concept of application-level classification.
+Each table of GitLab needs to have a `gitlab_schema` assigned:
-By default, GitLab is configured to use only one main database. To
-opt-in to use a main database, and CI database, modify the
-`config/database.yml` file to have a `main` and a `ci` database
-configurations.
+- `gitlab_main`: describes all tables that are being stored in the `main:` database (for example, like `projects`, `users`).
+- `gitlab_ci`: describes all CI tables that are being stored in the `ci:` database (for example, `ci_pipelines`, `ci_builds`).
+- `gitlab_shared`: describe all application tables that contain data across all decomposed databases (for example, `loose_foreign_keys_deleted_records`).
+- `...`: more schemas to be introduced with additional decomposed databases
-You can set this up using [GDK](#gdk-configuration) or by
-[manually configuring `config/database.yml`](#manually-set-up-the-cicd-database).
+The usage of schema enforces the base class to be used:
-#### GDK configuration
+- `ApplicationRecord` for `gitlab_main`
+- `Ci::ApplicationRecord` for `gitlab_ci`
+- `Gitlab::Database::SharedModel` for `gitlab_shared`
-If you are using GDK, you can follow the following steps:
+### The impact of `gitlab_schema`
-1. On the GDK root directory, run:
+The usage of `gitlab_schema` has a significant impact on the application.
+The `gitlab_schema` primary purpose is to introduce a barrier between different data access patterns.
- ```shell
- gdk config set gitlab.rails.databases.ci.enabled true
- ```
+This is used as a primary source of classification for:
-1. Open your `gdk.yml`, and confirm that it has the following lines:
+- [Discovering cross-joins across tables from different schemas](#removing-joins-between-ci_-and-non-ci_-tables)
+- [Discovering cross-database transactions across tables from different schemas](#removing-cross-database-transactions)
- ```yaml
- gitlab:
- rails:
- databases:
- ci:
- enabled: true
- ```
+### The special purpose of `gitlab_shared`
-1. Reconfigure GDK:
+`gitlab_shared` is a special case describing tables or views that by design contain data across
+all decomposed databases. This does describe application-defined tables (like `loose_foreign_keys_deleted_records`),
+Rails-defined tables (like `schema_migrations` or `ar_internal_metadata` as well as internal PostgreSQL tables
+(for example, `pg_attribute`).
- ```shell
- gdk reconfigure
- ```
+**Be careful** to use `gitlab_shared` as it requires special handling while accessing data.
+Since `gitlab_shared` shares not only structure but also data, the application needs to be written in a way
+that traverses all data from all databases in sequential manner.
-1. [Create the new CI/CD database](#create-the-new-database).
+```ruby
+Gitlab::Database::EachDatabase.each_model_connection([MySharedModel]) do |connection, connection_name|
+ MySharedModel.select_all_data...
+end
+```
-#### Manually set up the CI/CD database
+As such, migrations modifying data of `gitlab_shared` tables are expected to run across
+all decomposed databases.
-You can manually edit `config/database.yml` to split the databases.
-To do so, consider a `config/database.yml` file like the example below:
+## Migrations
-```yaml
-development:
- main:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_development
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
-
-test: &test
- main:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_test
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
-```
+Read [Migrations for Multiple Databases](migrations_for_multiple_databases.md).
-Edit it to split the databases into `main` and `ci`:
+## CI/CD Database
-```yaml
-development:
- main:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_development
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
- ci:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_development_ci
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
-
-test: &test
- main:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_test
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
- ci:
- adapter: postgresql
- encoding: unicode
- database: gitlabhq_test_ci
- host: /path/to/gdk/postgresql
- pool: 10
- prepared_statements: false
- variables:
- statement_timeout: 120s
-```
+> Support for configuring the GitLab Rails application to use a distinct
+database for CI/CD tables was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/64289)
+in GitLab 14.1. This feature is still under development, and is not ready for production use.
-Next, [create the new CI/CD database](#create-the-new-database).
+### Configure single database
-#### Create the new database
+By default, GDK is configured to run with multiple databases. To configure GDK to use a single database:
-After configuring GitLab for the two databases, create the new CI/CD database:
+1. On the GDK root directory, run:
-1. Create the new `ci:` database, load the DB schema into the `ci:` database,
- and run any pending migrations:
+ ```shell
+ gdk config set gitlab.rails.databases.ci.enabled false
+ ```
- ```shell
- bundle exec rails db:create db:schema:load:ci db:migrate
- ```
+1. Reconfigure GDK:
-1. Restart GDK:
+ ```shell
+ gdk reconfigure
+ ```
- ```shell
- gdk restart
- ```
+To switch back to using multiple databases, set `gitlab.rails.databases.ci.enabled` to `true` and run `gdk reconfigure`.
<!--
NOTE: The `validate_cross_joins!` method in `spec/support/database/prevent_cross_joins.rb` references
@@ -167,9 +112,9 @@ already many such examples that need to be fixed in
The following steps are the process to remove cross-database joins between
`ci_*` and non `ci_*` tables:
-1. **{check-circle}** Add all failing specs to the [`cross-join-allowlist.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/f5de89daeb468fc45e1e95a76d1b5297aa53da11/spec/support/database/cross-join-allowlist.yml)
+1. **{check-circle}** Add all failing specs to the [`cross-join-allowlist.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/spec/support/database/cross-join-allowlist.yml)
file.
-1. **{dotted-circle}** Find the code that caused the spec failure and wrap the isolated code
+1. **{check-circle}** Find the code that caused the spec failure and wrap the isolated code
in [`allow_cross_joins_across_databases`](#allowlist-for-existing-cross-joins).
Link to a new issue assigned to the correct team to remove the specs from the
`cross-join-allowlist.yml` file.
diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md
index de070f7e434..af7d569e282 100644
--- a/doc/development/database/not_null_constraints.md
+++ b/doc/development/database/not_null_constraints.md
@@ -197,7 +197,7 @@ end
If you have to clean up a nullable column for a [high-traffic table](../migration_style_guide.md#high-traffic-tables)
(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
-it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up)
+it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up)
in the release after adding the data migration.
In that rare case you will need 3 releases end-to-end:
diff --git a/doc/development/database/post_deployment_migrations.md b/doc/development/database/post_deployment_migrations.md
new file mode 100644
index 00000000000..799eefdb875
--- /dev/null
+++ b/doc/development/database/post_deployment_migrations.md
@@ -0,0 +1,81 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Post Deployment Migrations
+
+Post deployment migrations are regular Rails migrations that can optionally be
+executed after a deployment. By default these migrations are executed alongside
+the other migrations. To skip these migrations you must set the
+environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value
+when running `rake db:migrate`.
+
+For example, this would run all migrations including any post deployment
+migrations:
+
+```shell
+bundle exec rake db:migrate
+```
+
+This however skips post deployment migrations:
+
+```shell
+SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate
+```
+
+## Deployment Integration
+
+Say you're using Chef for deploying new versions of GitLab and you'd like to run
+post deployment migrations after deploying a new version. Let's assume you
+normally use the command `chef-client` to do so. To make use of this feature
+you'd have to run this command as follows:
+
+```shell
+SKIP_POST_DEPLOYMENT_MIGRATIONS=true sudo chef-client
+```
+
+Once all servers have been updated you can run `chef-client` again on a single
+server _without_ the environment variable.
+
+The process is similar for other deployment techniques: first you would deploy
+with the environment variable set, then you re-deploy a single
+server but with the variable _unset_.
+
+## Creating Migrations
+
+To create a post deployment migration you can use the following Rails generator:
+
+```shell
+bundle exec rails g post_deployment_migration migration_name_here
+```
+
+This generates the migration file in `db/post_migrate`. These migrations
+behave exactly like regular Rails migrations.
+
+## Use Cases
+
+Post deployment migrations can be used to perform migrations that mutate state
+that an existing version of GitLab depends on. For example, say you want to
+remove a column from a table. This requires downtime as a GitLab instance
+depends on this column being present while it's running. Normally you'd follow
+these steps in such a case:
+
+1. Stop the GitLab instance
+1. Run the migration removing the column
+1. Start the GitLab instance again
+
+Using post deployment migrations we can instead follow these steps:
+
+1. Deploy a new version of GitLab while ignoring post deployment migrations
+1. Re-run `rake db:migrate` but without the environment variable set
+
+Here we don't need any downtime as the migration takes place _after_ a new
+version (which doesn't depend on the column anymore) has been deployed.
+
+Some other examples where these migrations are useful:
+
+- Cleaning up data generated due to a bug in GitLab
+- Removing tables
+- Migrating jobs from one Sidekiq queue to another
diff --git a/doc/development/database/rename_database_tables.md b/doc/development/database/rename_database_tables.md
index 881adf00ad0..7a76c028042 100644
--- a/doc/development/database/rename_database_tables.md
+++ b/doc/development/database/rename_database_tables.md
@@ -82,7 +82,7 @@ when naming indexes, so there is a possibility that not all indexes are properly
the migration locally, check if there are inconsistently named indexes (`db/structure.sql`). Those can be
renamed manually in a separate migration, which can be also part of the release M.N+1.
- Foreign key columns might still contain the old table name. For smaller tables, follow our [standard column
-rename process](../avoiding_downtime_in_migrations.md#renaming-columns)
+rename process](avoiding_downtime_in_migrations.md#renaming-columns)
- Avoid renaming database tables which are using with triggers.
- Table modifications (add or remove columns) are not allowed during the rename process, please make sure that all changes to the table happen before the rename migration is started (or in the next release).
- As the index names might change, verify that the model does not use bulk insert
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 9674deb4603..4ed7cf1b4de 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -229,7 +229,7 @@ end
To keep this guide short, we skipped the definition of the background migration and only
provided a high level example of the post-deployment migration that is used to schedule the batches.
-You can find more information on the guide about [background migrations](../background_migrations.md)
+You can find more information on the guide about [background migrations](background_migrations.md)
#### Validate the text limit (next release)
@@ -277,7 +277,7 @@ end
If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
-it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up)
+it will need an additional [background migration cleaning up](background_migrations.md#cleaning-up)
in the release after adding the data migration.
In that rare case you will need 3 releases end-to-end:
diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md
index 5319c73aad0..ec768136404 100644
--- a/doc/development/database/table_partitioning.md
+++ b/doc/development/database/table_partitioning.md
@@ -214,7 +214,7 @@ end
```
This step uses the same mechanism as any background migration, so you
-may want to read the [Background Migration](../background_migrations.md)
+may want to read the [Background Migration](background_migrations.md)
guide for details on that process. Background jobs are scheduled every
2 minutes and copy `50_000` records at a time, which can be used to
estimate the timing of the background migration portion of the