summaryrefslogtreecommitdiff
path: root/doc/development/migration_style_guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/migration_style_guide.md')
-rw-r--r--doc/development/migration_style_guide.md80
1 files changed, 76 insertions, 4 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md
index 009ead8ba16..f76b053c2bd 100644
--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
@@ -19,8 +19,8 @@ Migrations are **not** allowed to require GitLab installations to be taken
offline ever. Migrations always must be written in such a way to avoid
downtime. In the past we had a process for defining migrations that allowed for
downtime by setting a `DOWNTIME` constant. You may see this when looking at
-older migrations. This process was in place for 4 years without every being
-used and as such we've learnt we can always figure out how to write a migration
+older migrations. This process was in place for 4 years without ever being
+used and as such we've learned we can always figure out how to write a migration
differently to avoid downtime.
When writing your migrations, also consider that databases might have stale data
@@ -35,10 +35,21 @@ For GitLab.com, please take into consideration that regular migrations (under `d
are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment),
and [post-deployment migrations](post_deployment_migrations.md) (`db/post_migrate`) are run after the deployment to production has finished.
+## Create database migrations
+
+To create a migration you can use the following Rails generator:
+
+```shell
+bundle exec rails g migration migration_name_here
+```
+
+This generates the migration file in `db/migrate`.
+
## Schema Changes
Changes to the schema should be committed to `db/structure.sql`. This
-file is automatically generated by Rails, so you normally should not
+file is automatically generated by Rails when you run
+`bundle exec rails db:migrate`, so you normally should not
edit this file by hand. If your migration is adding a column to a
table, that column is added at the bottom. Please do not reorder
columns manually for existing tables as this causes confusion to
@@ -205,6 +216,30 @@ def down
end
```
+**Multiple changes on the same table:**
+
+The helper `with_lock_retries` wraps all operations into a single transaction. When you have the lock,
+you should do as much as possible inside the transaction rather than trying to get another lock later.
+Be careful about running long database statements within the block. The acquired locks are kept until the transaction (block) finishes and depending on the lock type, it might block other database operations.
+
+```ruby
+include Gitlab::Database::MigrationHelpers
+
+def up
+ with_lock_retries do
+ add_column :users, :full_name, :string
+ add_column :users, :bio, :string
+ end
+end
+
+def down
+ with_lock_retries do
+ remove_column :users, :full_name
+ remove_column :users, :bio
+ end
+end
+```
+
**Removing a foreign key:**
```ruby
@@ -264,6 +299,8 @@ end
**Creating a new table when we have two foreign keys:**
+Only one foreign key should be created per migration. This is because [the addition of a foreign key constraint requires a `SHARE ROW EXCLUSIVE` lock on the referenced table](https://www.postgresql.org/docs/12/sql-createtable.html#:~:text=The%20addition%20of%20a%20foreign%20key%20constraint%20requires%20a%20SHARE%20ROW%20EXCLUSIVE%20lock%20on%20the%20referenced%20table), and locking multiple tables in the same transaction should be avoided.
+
For this, we need three migrations:
1. Creating the table without foreign keys (with the indices).
@@ -570,7 +607,7 @@ perform existence checks internally.
When adding a foreign-key constraint to either an existing or a new column also
remember to add an index on the column.
-This is **required** for all foreign-keys, e.g., to support efficient cascading
+This is **required** for all foreign-keys, for example, to support efficient cascading
deleting: when a lot of rows in a table get deleted, the referenced records need
to be deleted too. The database has to look for corresponding records in the
referenced table. Without an index, this results in a sequential scan on the
@@ -943,6 +980,41 @@ derived from the class name or namespace.
Be aware of the limitations [when using models in migrations](#using-models-in-migrations-discouraged).
+### Modifying existing data
+
+In most circumstances, prefer migrating data in **batches** when modifying data in the database.
+
+We introduced a new helper [each_batch_range](https://gitlab.com/gitlab-org/gitlab/-/blob/cd3e0a5cddcb464cb9b8c6e3275839cf57dfa6e2/lib/gitlab/database/dynamic_model_helpers.rb#L28-32) which facilitates the process of iterating over a collection in a performant way. The default size of the batch is defined in the `BATCH_SIZE` constant.
+
+See the following example to get an idea.
+
+**Purging data in batch:**
+
+```ruby
+include ::Gitlab::Database::DynamicModelHelpers
+
+disable_ddl_transaction!
+
+def up
+ each_batch_range('ci_pending_builds', scope: ->(table) { table.ref_protected }, of: BATCH_SIZE) do |min, max|
+ execute <<~SQL
+ DELETE FROM ci_pending_builds
+ USING ci_builds
+ WHERE ci_builds.id = ci_pending_builds.build_id
+ AND ci_builds.status != 'pending'
+ AND ci_builds.type = 'Ci::Build'
+ AND ci_pending_builds.id BETWEEN #{min} AND #{max}
+ SQL
+ end
+end
+```
+
+- The first argument is the table being modified: `'ci_pending_builds'`.
+- The second argument calls a lambda which fetches the relevant dataset selected (the default is set to `.all`): `scope: ->(table) { table.ref_protected }`.
+- The third argument is the batch size (the default is set in the `BATCH_SIZE` constant): `of: BATCH_SIZE`.
+
+Here is an [example MR](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/62195) illustrating how to use our new helper.
+
### Renaming reserved paths
When a new route for projects is introduced, it could conflict with any