diff options
Diffstat (limited to 'doc/development/migration_style_guide.md')
-rw-r--r-- | doc/development/migration_style_guide.md | 82 |
1 files changed, 42 insertions, 40 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md index 5764c876e4d..8f035d4aa13 100644 --- a/doc/development/migration_style_guide.md +++ b/doc/development/migration_style_guide.md @@ -184,13 +184,21 @@ git checkout origin/master db/structure.sql VERSION=<migration ID> bundle exec rails db:migrate:main ``` -### Adding new tables to GitLab Schema - -GitLab connects to two different Postgres databases: `main` and `ci`. New tables should be defined in [`lib/gitlab/database/gitlab_schemas.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/gitlab_schemas.yml) with the databases they need to be added to. - - ```yaml - <TABLE_NAME>: :gitlab_main - ``` +### Adding new tables to the database dictionary + +GitLab connects to two different Postgres databases: `main` and `ci`. New tables should be defined in [`db/docs/`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/db/docs): + +```yaml +table_name: table name exmaple +description: Description example +introduced_by_url: Merge request link +milestone: Milestone example +feature_categories: +- Feature category example +classes: +- Class example +gitlab_schema: gitlab_main +``` ## Avoiding downtime @@ -305,7 +313,7 @@ of migration helpers. In this example, we use version 2.0 of the migration class: ```ruby -class TestMigration < Gitlab::Database::Migration[2.0] +class TestMigration < Gitlab::Database::Migration[2.1] def change end end @@ -338,7 +346,7 @@ is concurrently accessed and modified by other processes, acquiring the lock may a while. The lock request is waiting in a queue and it may also block other queries on the `users` table once it has been enqueued. -More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html) +More information about PostgreSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html) For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql) set. When the migration is invoked, any database query has @@ -376,7 +384,7 @@ Occasionally a migration may need to acquire multiple locks on different objects To prevent catalog bloat, ask for all those locks explicitly before performing any DDL. A better strategy is to split the migration, so that we only need to acquire one lock at the time. -**Removing a column:** +#### Removing a column ```ruby enable_lock_retries! @@ -386,7 +394,7 @@ def change end ``` -**Multiple changes on the same table:** +#### Multiple changes on the same table With the lock-retry methodology enabled, all operations wrap into a single transaction. When you have the lock, you should do as much as possible inside the transaction rather than trying to get another lock later. @@ -406,7 +414,7 @@ def down end ``` -**Removing a foreign key:** +#### Removing a foreign key ```ruby enable_lock_retries! @@ -420,7 +428,7 @@ def down end ``` -**Changing default value for a column:** +#### Changing default value for a column ```ruby enable_lock_retries! @@ -434,7 +442,7 @@ def down end ``` -**Creating a new table with a foreign key:** +#### Creating a new table with a foreign key We can wrap the `create_table` method with `with_lock_retries`: @@ -453,7 +461,7 @@ def down end ``` -**Creating a new table when we have two foreign keys:** +#### Creating a new table when we have two foreign keys Only one foreign key should be created per transaction. This is because [the addition of a foreign key constraint requires a `SHARE ROW EXCLUSIVE` lock on the referenced table](https://www.postgresql.org/docs/12/sql-createtable.html#:~:text=The%20addition%20of%20a%20foreign%20key%20constraint%20requires%20a%20SHARE%20ROW%20EXCLUSIVE%20lock%20on%20the%20referenced%20table), and locking multiple tables in the same transaction should be avoided. @@ -600,7 +608,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby -class MyMigration < Gitlab::Database::Migration[2.0] +class MyMigration < Gitlab::Database::Migration[2.1] disable_ddl_transaction! INDEX_NAME = 'index_name' @@ -611,10 +619,10 @@ class MyMigration < Gitlab::Database::Migration[2.0] end ``` -Verify the index is not being used anymore with this Thanos query: +You can verify that the index is not being used with [Thanos](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20by%20(type)(rate(pg_stat_user_indexes_idx_scan%7Benv%3D%22gprd%22%2C%20indexrelname%3D%22INSERT%20INDEX%20NAME%20HERE%22%7D%5B30d%5D))&g0.tab=1&g0.stacked=0&g0.range_input=1h&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D): ```sql -sum by (type)(rate(pg_stat_user_indexes_idx_scan{env="gprd", indexrelname="index_groups_on_parent_id_id"}[5m])) +sum by (type)(rate(pg_stat_user_indexes_idx_scan{env="gprd", indexrelname="INSERT INDEX NAME HERE"}[30d])) ``` Note that it is not necessary to check if the index exists prior to @@ -657,7 +665,7 @@ by calling the method `disable_ddl_transaction!` in the body of your migration class like so: ```ruby -class MyMigration < Gitlab::Database::Migration[2.0] +class MyMigration < Gitlab::Database::Migration[2.1] disable_ddl_transaction! INDEX_NAME = 'index_name' @@ -700,7 +708,7 @@ The easiest way to test for existence of an index by name is to use the be used with a name option. For example: ```ruby -class MyMigration < Gitlab::Database::Migration[2.0] +class MyMigration < Gitlab::Database::Migration[2.1] INDEX_NAME = 'index_name' def up @@ -735,7 +743,7 @@ Here's an example where we add a new column with a foreign key constraint. Note it includes `index: true` to create an index for it. ```ruby -class Migration < Gitlab::Database::Migration[2.0] +class Migration < Gitlab::Database::Migration[2.1] def change add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade } @@ -766,12 +774,7 @@ With PostgreSQL 11 being the minimum version in GitLab 13.0 and later, adding co the standard `add_column` helper should be used in all cases. Before PostgreSQL 11, adding a column with a default was problematic as it would -have caused a full table rewrite. The corresponding helper `add_column_with_default` -has been deprecated and is scheduled to be removed in a later release. - -If a backport adding a column with a default value is needed for %12.9 or earlier versions, -it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3) -is involved, backporting to %12.9 is contraindicated. +have caused a full table rewrite. ## Removing the column default for non-nullable columns @@ -796,7 +799,7 @@ expensive and disruptive operation for larger tables, but in reality it's not. Take the following migration as an example: ```ruby -class DefaultRequestAccessGroups < Gitlab::Database::Migration[2.0] +class DefaultRequestAccessGroups < Gitlab::Database::Migration[2.1] def change change_column_default(:namespaces, :request_access_enabled, from: false, to: true) end @@ -818,7 +821,7 @@ in the `namespaces` table. Only when creating a new column with a default, all t NOTE: A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/) -was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added. +was introduced on PostgreSQL 11.0, removing the need of rewriting the table when a new column with a default value is added. For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration without requiring `disable_ddl_transaction!`. @@ -852,8 +855,7 @@ update_column_in_batches(:projects, :foo, update_value) do |table, query| end ``` -Like `add_column_with_default`, there is a RuboCop cop to detect usage of this -on large tables. In the case of `update_column_in_batches`, it may be acceptable +In the case of `update_column_in_batches`, it may be acceptable to run on a large table, as long as it is only updating a small subset of the rows in the table, but do not ignore that without validating on the GitLab.com staging environment - or asking someone else to do so for you - beforehand. @@ -991,7 +993,7 @@ Re-add a sequence: A Rails migration example: ```ruby -class DropSequenceTest < Gitlab::Database::Migration[2.0] +class DropSequenceTest < Gitlab::Database::Migration[2.1] def up drop_sequence(:ci_pipelines_config, :pipeline_id, :ci_pipelines_config_pipeline_id_seq) end @@ -1022,7 +1024,7 @@ Under the hood, it works like this: - Add the primary key using the index defined beforehand. ```ruby -class SwapPrimaryKey < Gitlab::Database::Migration[2.0] +class SwapPrimaryKey < Gitlab::Database::Migration[2.1] TABLE_NAME = :table_name PRIMARY_KEY = :table_name_pkey OLD_INDEX_NAME = :old_index_name @@ -1113,18 +1115,18 @@ The Rails 5 natively supports `JSONB` (binary JSON) column type. Example migration adding this column: ```ruby -class AddOptionsToBuildMetadata < Gitlab::Database::Migration[2.0] +class AddOptionsToBuildMetadata < Gitlab::Database::Migration[2.1] def change add_column :ci_builds_metadata, :config_options, :jsonb end end ``` -You have to use a serializer to provide a translation layer: +By default hash keys will be strings. Optionally you can add a custom data type to provide different access to keys. ```ruby class BuildMetadata - serialize :config_options, Serializers::Json # rubocop:disable Cop/ActiveRecordSerialize + attribute :config_options, :ind_jsonb # for indifferent accesss or :sym_jsonb if you need symbols only as keys. end ``` @@ -1145,7 +1147,7 @@ Do not store `attr_encrypted` attributes as `:text` in the database; use efficient: ```ruby -class AddSecretToSomething < Gitlab::Database::Migration[2.0] +class AddSecretToSomething < Gitlab::Database::Migration[2.1] def change add_column :something, :encrypted_secret, :binary add_column :something, :encrypted_secret_iv, :binary @@ -1203,7 +1205,7 @@ If you need more complex logic, you can define and use models local to a migration. For example: ```ruby -class MyMigration < Gitlab::Database::Migration[2.0] +class MyMigration < Gitlab::Database::Migration[2.1] class Project < MigrationRecord self.table_name = 'projects' end @@ -1227,7 +1229,7 @@ Be aware of the limitations [when using models in migrations](#using-models-in-m In most circumstances, prefer migrating data in **batches** when modifying data in the database. -We introduced a new helper [each_batch_range](https://gitlab.com/gitlab-org/gitlab/-/blob/cd3e0a5cddcb464cb9b8c6e3275839cf57dfa6e2/lib/gitlab/database/dynamic_model_helpers.rb#L28-32) which facilitates the process of iterating over a collection in a performant way. The default size of the batch is defined in the `BATCH_SIZE` constant. +We introduced a new helper [`each_batch_range`](https://gitlab.com/gitlab-org/gitlab/-/blob/cd3e0a5cddcb464cb9b8c6e3275839cf57dfa6e2/lib/gitlab/database/dynamic_model_helpers.rb#L28-32) which facilitates the process of iterating over a collection in a performant way. The default size of the batch is defined in the `BATCH_SIZE` constant. See the following example to get an idea. @@ -1302,7 +1304,7 @@ in a previous migration. It is important not to leave out the `User.reset_column_information` command, to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information. ```ruby -class AddAndSeedMyColumn < Gitlab::Database::Migration[2.0] +class AddAndSeedMyColumn < Gitlab::Database::Migration[2.1] class User < MigrationRecord self.table_name = 'users' end |