summaryrefslogtreecommitdiff
path: root/doc/development/migration_style_guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/migration_style_guide.md')
-rw-r--r--doc/development/migration_style_guide.md72
1 files changed, 36 insertions, 36 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md
index 84679a78545..8cdfbd558ca 100644
--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
@@ -1,13 +1,13 @@
---
stage: none
group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Migration Style Guide
When writing migrations for GitLab, you have to take into account that
-these will be run by hundreds of thousands of organizations of all sizes, some with
+these are run by hundreds of thousands of organizations of all sizes, some with
many years of data in their database.
In addition, having to take a server offline for an upgrade small or big is a
@@ -44,15 +44,15 @@ and post-deployment migrations (`db/post_migrate`) are run after the deployment
Changes to the schema should be committed to `db/structure.sql`. This
file is automatically generated by Rails, so you normally should not
edit this file by hand. If your migration is adding a column to a
-table, that column will be added at the bottom. Please do not reorder
-columns manually for existing tables as this will cause confusion to
+table, that column is added at the bottom. Please do not reorder
+columns manually for existing tables as this causes confusion to
other people using `db/structure.sql` generated by Rails.
When your local database in your GDK is diverging from the schema from
`master` it might be hard to cleanly commit the schema changes to
Git. In that case you can use the `scripts/regenerate-schema` script to
regenerate a clean `db/structure.sql` for the migrations you're
-adding. This script will apply all migrations found in `db/migrate`
+adding. This script applies all migrations found in `db/migrate`
or `db/post_migrate`, so if there are any migrations you don't want to
commit to the schema, rename or remove them. If your branch is not
targeting `master` you can set the `TARGET` environment variable.
@@ -80,7 +80,7 @@ and whether they require downtime and how to work around that whenever possible.
Every migration must specify if it requires downtime or not, and if it should
require downtime it must also specify a reason for this. This is required even
-if 99% of the migrations won't require downtime as this makes it easier to find
+if 99% of the migrations don't require downtime as this makes it easier to find
the migrations that _do_ require downtime.
To tag a migration, add the following two constants to the migration class'
@@ -104,7 +104,7 @@ class MyMigration < ActiveRecord::Migration[6.0]
end
```
-It is an error (that is, CI will fail) if the `DOWNTIME` constant is missing
+It is an error (that is, CI fails) if the `DOWNTIME` constant is missing
from a migration class.
## Reversibility
@@ -136,7 +136,7 @@ By default, migrations are single transaction. That is, a transaction is opened
at the beginning of the migration, and committed after all steps are processed.
Running migrations in a single transaction makes sure that if one of the steps fails,
-none of the steps will be executed, leaving the database in valid state.
+none of the steps are executed, leaving the database in valid state.
Therefore, either:
- Put all migrations in one single-transaction migration.
@@ -146,15 +146,16 @@ Therefore, either:
For example, if you create an empty table and need to build an index for it,
it is recommended to use a regular single-transaction migration and the default
rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index).
-This is a blocking operation, but it won't cause problems because the table is not yet used,
+This is a blocking operation, but it doesn't cause problems because the table is not yet used,
and therefore it does not have any records yet.
## Heavy operations in a single transaction
-When using a single-transaction migration, a transaction will hold on a database connection
+When using a single-transaction migration, a transaction holds a database connection
for the duration of the migration, so you must make sure the actions in the migration
-do not take too much time: In general, queries executed in a migration need to fit comfortably
-within `15s` on GitLab.com.
+do not take too much time: GitLab.com’s production database has a `15s` timeout, so
+in general, the cumulative execution time in a migration should aim to fit comfortably
+in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries)
In case you need to insert, update, or delete a significant amount of data, you:
@@ -182,8 +183,8 @@ on the `users` table once it has been enqueued.
More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html)
For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql)
-set. When the migration is invoked, any database query will have
-a fixed time to execute. In a worst-case scenario, the request will sit in the
+set. When the migration is invoked, any database query has
+a fixed time to execute. In a worst-case scenario, the request sits in the
lock queue, blocking other queries for the duration of the configured statement timeout,
then failing with `canceling statement due to statement timeout` error.
@@ -274,7 +275,7 @@ end
**Creating a new table when we have two foreign keys:**
-For this, we'll need three migrations:
+For this, we need three migrations:
1. Creating the table without foreign keys (with the indices).
1. Add foreign key to the first table.
@@ -354,7 +355,7 @@ def up
end
```
-The RuboCop rule generally allows standard Rails migration methods, listed below. This example will cause a Rubocop offense:
+The RuboCop rule generally allows standard Rails migration methods, listed below. This example causes a Rubocop offense:
```ruby
disable_ddl_transaction!
@@ -399,9 +400,9 @@ In a worst-case scenario, the method:
- Executes the block for a maximum of 50 times over 40 minutes.
- Most of the time is spent in a pre-configured sleep period after each iteration.
-- After the 50th retry, the block will be executed without `lock_timeout`, just
+- After the 50th retry, the block is executed without `lock_timeout`, just
like a standard migration invocation.
-- If a lock cannot be acquired, the migration will fail with `statement timeout` error.
+- If a lock cannot be acquired, the migration fails with `statement timeout` error.
The migration might fail if there is a very long running transaction (40+ minutes)
accessing the `users` table.
@@ -437,9 +438,9 @@ class MyMigration < ActiveRecord::Migration[6.0]
end
```
-Here the call to `disable_statement_timeout` will use the connection local to
+Here the call to `disable_statement_timeout` uses the connection local to
the `with_multiple_threads` block, instead of re-using the global connection
-pool. This ensures each thread has its own connection object, and won't time
+pool. This ensures each thread has its own connection object, and doesn't time
out when trying to obtain one.
PostgreSQL has a maximum amount of connections that it allows. This
@@ -461,8 +462,9 @@ class MyMigration < ActiveRecord::Migration[6.0]
include Gitlab::Database::MigrationHelpers
disable_ddl_transaction!
+ INDEX_NAME = 'index_name'
def up
- remove_concurrent_index :table_name, :column_name, name: :index_name
+ remove_concurrent_index :table_name, :column_name, name: INDEX_NAME
end
end
```
@@ -582,7 +584,7 @@ remember to add an index on the column.
This is **required** for all foreign-keys, e.g., to support efficient cascading
deleting: when a lot of rows in a table get deleted, the referenced records need
to be deleted too. The database has to look for corresponding records in the
-referenced table. Without an index, this will result in a sequential scan on the
+referenced table. Without an index, this results in a sequential scan on the
table, which can take a long time.
Here's an example where we add a new column with a foreign key
@@ -620,7 +622,7 @@ the standard `add_column` helper should be used in all cases.
Before PostgreSQL 11, adding a column with a default was problematic as it would
have caused a full table rewrite. The corresponding helper `add_column_with_default`
-has been deprecated and will be removed in a later release.
+has been deprecated and is scheduled to be removed in a later release.
If a backport adding a column with a default value is needed for %12.9 or earlier versions,
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
@@ -656,7 +658,7 @@ In this particular case, the default value exists and we're just changing the me
`request_access_enabled` column, which does not imply a rewrite of all the existing records
in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten.
-NOTE: **Note:**
+NOTE:
A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/)
was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added.
@@ -666,7 +668,7 @@ without requiring `disable_ddl_transaction!`.
## Updating an existing column
To update an existing column to a particular value, you can use
-`update_column_in_batches`. This will split the updates into batches, so we
+`update_column_in_batches`. This splits the updates into batches, so we
don't update too many rows at in a single statement.
This updates the column `foo` in the `projects` table to 10, where `some_column`
@@ -781,12 +783,12 @@ end
## Integer column type
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
-a max value of 2,147,483,647. Be aware of this when creating a column that will
-hold file sizes in byte units. If you are tracking file size in bytes, this
+a max value of 2,147,483,647. Be aware of this when creating a column that
+holds file sizes in byte units. If you are tracking file size in bytes, this
restricts the maximum file size to just over 2GB.
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
-set the limit to 8-bytes. This will allow the column to hold a value up to
+set the limit to 8-bytes. This allows the column to hold a value up to
`9,223,372,036,854,775,807`.
Rails migration example:
@@ -836,7 +838,7 @@ timestamps with timezones:
- `datetime_with_timezone`
This ensures all timestamps have a time zone specified. This, in turn, means
-existing timestamps won't suddenly use a different timezone when the system's
+existing timestamps don't suddenly use a different timezone when the system's
timezone changes. It also makes it very clear which timezone was used in the
first place.
@@ -937,19 +939,17 @@ Since we had to do this a few times already, there are now some helpers to help
with this.
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
-in your migration. This will provide 3 methods which you can pass one or more
+in your migration. This provides 3 methods which you can pass one or more
paths that need to be rejected.
-**`rename_root_paths`**: This will rename the path of all _namespaces_ with the
+- **`rename_root_paths`**: Renames the path of all _namespaces_ with the
given name that don't have a `parent_id`.
-
-**`rename_child_paths`**: This will rename the path of all _namespaces_ with the
+- **`rename_child_paths`**: Renames the path of all _namespaces_ with the
given name that have a `parent_id`.
-
-**`rename_wildcard_paths`**: This will rename the path of all _projects_, and all
+- **`rename_wildcard_paths`**: Renames the path of all _projects_, and all
_namespaces_ that have a `project_id`.
-The `path` column for these rows will be renamed to their previous value followed
+The `path` column for these rows are renamed to their previous value followed
by an integer. For example: `users` would turn into `users0`
## Using models in migrations (discouraged)