summaryrefslogtreecommitdiff
path: root/doc/development/database/migrations_for_multiple_databases.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database/migrations_for_multiple_databases.md')
-rw-r--r--doc/development/database/migrations_for_multiple_databases.md390
1 files changed, 390 insertions, 0 deletions
diff --git a/doc/development/database/migrations_for_multiple_databases.md b/doc/development/database/migrations_for_multiple_databases.md
new file mode 100644
index 00000000000..0ec4612e985
--- /dev/null
+++ b/doc/development/database/migrations_for_multiple_databases.md
@@ -0,0 +1,390 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Migrations for Multiple databases
+
+> Support for describing migration purposes was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/73756) in GitLab 14.8.
+
+This document describes how to properly write database migrations
+for [the decomposed GitLab application using multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
+
+Learn more about general multiple databases support in a [separate document](multiple_databases.md).
+
+WARNING:
+If you experience any issues using `Gitlab::Database::Migration[2.0]`,
+you can temporarily revert back to the previous behavior by changing the version to `Gitlab::Database::Migration[1.0]`.
+Please report any issues with `Gitlab::Database::Migration[2.0]` in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/358430).
+
+The design for multiple databases (except for the Geo database) assumes
+that all decomposed databases have **the same structure** (for example, schema), but **the data is different** in each database. This means that some tables do not contain data on each database.
+
+## Operations
+
+Depending on the used constructs, we can classify migrations to be either:
+
+1. Modifying structure ([DDL - Data Definition Language](https://www.postgresql.org/docs/current/ddl.html)) (for example, `ALTER TABLE`).
+1. Modifying data ([DML - Data Manipulation Language](https://www.postgresql.org/docs/current/dml.html)) (for example, `UPDATE`).
+1. Performing [other queries](https://www.postgresql.org/docs/current/queries.html) (for example, `SELECT`) that are treated as **DML** for the purposes of our migrations.
+
+**The usage of `Gitlab::Database::Migration[2.0]` requires migrations to always be of a single purpose**.
+Migrations cannot mix **DDL** and **DML** changes as the application requires the structure
+(as described by `db/structure.sql`) to be exactly the same across all decomposed databases.
+
+### Data Definition Language (DDL)
+
+The DDL migrations are all migrations that:
+
+1. Create or drop a table (for example, `create_table`).
+1. Add or remove an index (for example, `add_index`, `add_index_concurrently`).
+1. Add or remove a foreign key (for example `add_foreign_key`, `add_foreign_key_concurrently`).
+1. Add or remove a column with or without a default value (for example, `add_column`).
+1. Create or drop trigger functions (for example, `create_trigger_function`).
+1. Attach or detach triggers from tables (for example, `track_record_deletions`, `untrack_record_deletions`).
+1. Prepare or not async indexes (for example, `prepare_async_index`, `unprepare_async_index_by_name`).
+
+As such DDL migrations **CANNOT**:
+
+1. Read or modify data in any form, via SQL statements or ActiveRecord models.
+1. Update column values (for example, `update_column_in_batches`).
+1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
+1. Read the state of feature flags since they are stored in `main:` (a `features` and `feature_gates`).
+1. Read application settings (as settings are stored in `main:`).
+
+As the majority of migrations in the GitLab codebase are of the DDL-type,
+this is also the default mode of operation and requires no further changes
+to the migrations files.
+
+#### Example: perform DDL on all databases
+
+Example migration adding a concurrent index that is treated as change of the structure (DDL)
+that is executed on all configured databases.
+
+```ruby
+class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
+ end
+
+ def down
+ remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
+ end
+end
+```
+
+### Data Manipulation Language (DML)
+
+The DML migrations are all migrations that:
+
+1. Read data via SQL statements (for example, `SELECT * FROM projects WHERE id=1`).
+1. Read data via ActiveRecord models (for example, `User < MigrationRecord`).
+1. Create, update or delete data via ActiveRecord models (for example, `User.create!(...)`).
+1. Create, update or delete data via SQL statements (for example, `DELETE FROM projects WHERE id=1`).
+1. Update columns in batches (for example, `update_column_in_batches(:projects, :archived, true)`).
+1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
+1. Access application settings (for example, `ApplicationSetting.last` if run for `main:` database).
+1. Read and modify feature flags if run for the `main:` database.
+
+The DML migrations **CANNOT**:
+
+1. Make any changes to DDL since this breaks the rule of keeping `structure.sql` coherent across
+ all decomposed databases.
+1. **Read data from another database**.
+
+To indicate the `DML` migration type, a migration must use the `restrict_gitlab_migration gitlab_schema:`
+syntax in a migration class. This marks the given migration as DML and restricts access to it.
+
+#### Example: perform DML only in context of the database containing the given `gitlab_schema`
+
+Example migration updating `archived` column of `projects` that is executed
+only for the database containing `gitlab_main` schema.
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+#### Example: usage of `ActiveRecord` classes
+
+A migration using `ActiveRecord` class to perform data manipulation
+must use the `MigrationRecord` class. This class is guaranteed to provide
+a correct connection in a context of a given migration.
+
+Underneath the `MigrationRecord == ActiveRecord::Base`, as once the `db:migrate`
+runs, it switches the active connection of `ActiveRecord::Base.establish_connection :ci`.
+To avoid confusion to using the `ActiveRecord::Base`, `MigrationRecord` is required.
+
+This implies that DML migrations are forbidden to read data from other
+databases. For example, running migration in context of `ci:` and reading feature flags
+from `main:`, as no established connection to another database is present.
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ class Project < MigrationRecord
+ end
+
+ def up
+ Project.where(archived: false).each_batch of |batch|
+ batch.update_all(archived: true)
+ end
+ end
+
+ def down
+ end
+end
+```
+
+### The special purpose of `gitlab_shared`
+
+As described in [gitlab_schema](multiple_databases.md#the-special-purpose-of-gitlab_shared),
+the `gitlab_shared` tables are allowed to contain data across all databases. This implies
+that such migrations should run across all databases to modify structure (DDL) or modify data (DML).
+
+As such migrations accessing `gitlab_shared` do not need to use `restrict_gitlab_migration gitlab_schema:`,
+migrations without restriction run across all databases and are allowed to modify data on each of them.
+If the `restrict_gitlab_migration gitlab_schema:` is specified, the `DML` migration
+runs only in a context of a database containing the given `gitlab_schema`.
+
+#### Example: run DML `gitlab_shared` migration on all databases
+
+Example migration updating `loose_foreign_keys_deleted_records` table
+that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
+
+This migration is executed across all configured databases.
+
+```ruby
+class DeleteAllLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ def up
+ execute("DELETE FROM loose_foreign_keys_deleted_records")
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+#### Example: run DML `gitlab_shared` only on the database containing the given `gitlab_schema`
+
+Example migration updating `loose_foreign_keys_deleted_records` table
+that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
+
+This migration since it configures restriction on `gitlab_ci` is executed only
+in context of database containing `gitlab_ci` schema.
+
+```ruby
+class DeleteCiBuildsLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ execute("DELETE FROM loose_foreign_keys_deleted_records WHERE fully_qualified_table_name='ci_builds'")
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+### The behavior of skipping migrations
+
+The only migrations that are skipped are the ones performing **DML** changes.
+The **DDL** migrations are **always and unconditionally** executed.
+
+The implemented [solution](https://gitlab.com/gitlab-org/gitlab/-/issues/355014#solution-2-use-database_tasks)
+uses the `database_tasks:` as a way to indicate which additional database configurations
+(in `config/database.yml`) share the same primary database. The database configurations
+marked with `database_tasks: false` are exempt from executing `db:migrate` for those
+database configurations.
+
+If database configurations do not share databases (all do have `database_tasks: true`),
+each migration runs for every database configuration:
+
+1. The DDL migration applies all structure changes on all databases.
+1. The DML migration runs only in the context of a database containing the given `gitlab_schema:`.
+1. If the DML migration is not eligible to run, it is skipped. It's still
+ marked as executed in `schema_migrations`. While running `db:migrate`, the skipped
+ migration outputs `Current migration is skipped since it modifies 'gitlab_ci' which is outside of 'gitlab_main, gitlab_shared`.
+
+To prevent loss of migrations if the `database_tasks: false` is configured, a dedicated
+Rake task is used [`gitlab:db:validate_config`](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83118).
+The `gitlab:db:validate_config` validates the correctness of `database_tasks:` by checking database identifiers
+of each underlying database configuration. The ones that share the database are required to have
+the `database_tasks: false` set. `gitlab:db:validate_config` always runs before `db:migrate`.
+
+## Validation
+
+Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze
+each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema).
+The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas
+managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`).
+
+The `Gitlab::Database::Migration[2.0]` includes `Gitlab::Database::MigrationHelpers::RestrictGitlabSchema`
+which extends the `#migrate` method. For the duration of a migration a dedicated query analyzer
+is installed `Gitlab::Database::QueryAnalyzers::RestrictAllowedSchemas` that accepts
+a list of allowed schemas as defined by `restrict_gitlab_migration:`. If the executed query
+is outside of allowed schemas, it raises an exception.
+
+## Exceptions
+
+Depending on misuse or lack of `restrict_gitlab_migration` various exceptions can be raised
+as part of the migration run and prevent the migration from being completed.
+
+### Exception 1: migration running in DDL mode does DML select
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # Missing:
+ # restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+```plaintext
+Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode
+Modifying of 'projects' (gitlab_main) with 'SELECT * FROM projects...
+```
+
+The current migration do not use `restrict_gitlab_migration`. The lack indicates a migration
+running in **DDL** mode, but the executed payload appears to be reading data from `projects`.
+
+**The solution** is to add `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
+
+### Exception 2: migration running in DML mode changes the structure
+
+```ruby
+class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # restrict_gitlab_migration if defined indicates DML, it should be removed
+ restrict_gitlab_migration gitlab_schema: :gitlab_main
+
+ INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
+ end
+
+ def down
+ remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
+ end
+end
+```
+
+```plaintext
+DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode.
+Modifying of 'merge_request_reviewers' with 'CREATE INDEX...
+```
+
+The current migration do use `restrict_gitlab_migration`. The presence indicates **DML** mode,
+but the executed payload appears to be doing structure changes (DDL).
+
+**The solution** is to remove `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
+
+### Exception 3: migration running in DML mode accesses data from a table in another schema
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # Since it modifies `projects` it should use `gitlab_main`
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+```plaintext
+Select/DML queries (SELECT/UPDATE/DELETE) do access 'projects' (gitlab_main) " \
+which is outside of list of allowed schemas: 'gitlab_ci'
+```
+
+The current migration do restrict the migration to `gitlab_ci`, but appears to modify
+data in `gitlab_main`.
+
+**The solution** is to change `restrict_gitlab_migration gitlab_schema: :gitlab_ci`.
+
+### Exception 4: mixing DDL and DML mode
+
+```ruby
+class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
+ disable_ddl_transaction!
+
+ # This migration is invalid regardless of specification
+ # as it cannot modify structure and data at the same time
+ restrict_gitlab_migration gitlab_schema: :gitlab_ci
+
+ def up
+ add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: 'index_on_merge_request_reviewers'
+ update_column_in_batches(:projects, :archived, true) do |table, query|
+ query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
+ end
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+The migrations mixing **DDL** and **DML** depending on ordering of operations raises
+one of the prior exceptions.
+
+## Upcoming changes on multiple database migrations
+
+The `restrict_gitlab_migration` using `gitlab_schema:` is considered as a first iteration
+of this feature for running migrations selectively depending on a context. It is possible
+to add additional restrictions to DML-only migrations (as the structure coherency is likely
+to stay as-is until further notice) to restrict when they run.
+
+A Potential extension is to limit running DML migration only to specific environments:
+
+```ruby
+restrict_gitlab_migration gitlab_schema: :gitlab_main, gitlab_env: :gitlab_com
+```