Prepare for zero downtime migrationszero-downtime-migrations

Starting with GitLab 9.1.0 we will no longer allow downtime migrations unless absolutely necessary. This commit updates the various developer guides and adds code that is necessary to make zero downtime migrations less painful.
author: Yorick Peterse <yorickpeterse@gmail.com> 2017-04-12 18:15:19 +0200
committer: Yorick Peterse <yorickpeterse@gmail.com> 2017-04-12 18:15:19 +0200
commit: 223d8a3d26a7561fcae9536efbf120d7c4760bd4 (patch)
tree: e87582544e973da9ddb6fba97b2958bbc0508751 /doc/development/what_requires_downtime.md
parent: a179c5ca412ebf1fbe7432c654f4bea6d155233b (diff)
download: gitlab-ce-223d8a3d26a7561fcae9536efbf120d7c4760bd4.tar.gz
1 files changed, 181 insertions, 56 deletions
diff --git a/doc/development/what_requires_downtime.md b/doc/development/what_requires_downtime.md
index bbcd26477f3..8da6ad684f5 100644
--- a/doc/development/what_requires_downtime.md
+++ b/doc/development/what_requires_downtime.md
@@ -2,7 +2,8 @@
 
 When working with a database certain operations can be performed without taking
 GitLab offline, others do require a downtime period. This guide describes
-various operations and their impact.
+various operations, their impact, and how to perform them without requiring
+downtime.
 
 ## Adding Columns
 
@@ -41,50 +42,156 @@ information on how to use this method.
 
 ## Dropping Columns
 
-On PostgreSQL you can safely remove an existing column without the need for
-downtime. When you drop a column in PostgreSQL it's not immediately removed,
-instead it is simply disabled. The data is removed on the next vacuum run.
+Removing columns is tricky because running GitLab processes may still be using
+the columns. To work around this you will need two separate merge requests and
+releases: one to ignore and then remove the column, and one to remove the ignore
+rule.
 
-On MySQL this operation requires downtime.
+### Step 1: Ignoring The Column
 
-While database wise dropping a column may be fine on PostgreSQL this operation
-still requires downtime because the application code may still be using the
-column that was removed. For example, consider the following migration:
+The first step is to ignore the column in the application code. This is
+necessary because Rails caches the columns and re-uses this cache in various
+places. This can be done by including the `IgnorableColumn` module into the
+model, followed by defining the columns to ignore. For example, to ignore
+`updated_at` in the User model you'd use the following:
 
 ```ruby
-class MyMigration < ActiveRecord::Migration
-  def change
-    remove_column :projects, :dummy
-  end
+class User < ActiveRecord::Base
+  include IgnorableColumn
+
+  ignore_column :updated_at
 end
 ```
 
-Now imagine that the GitLab instance is running and actively uses the `dummy`
-column. If we were to run the migration this would result in the GitLab instance
-producing errors whenever it tries to use the `dummy` column.
+Once added you should create a _post-deployment_ migration that removes the
+column. Both these changes should be submitted in the same merge request.
 
-As a result of the above downtime _is_ required when removing a column, even
-when using PostgreSQL.
+### Step 2: Removing The Ignore Rule
+
+Once the changes from step 1 have been released & deployed you can set up a
+separate merge request that removes the ignore rule. This merge request can
+simply remove the `ignore_column` line, and the `include IgnorableColumn` line
+if no other `ignore_column` calls remain.
 
 ## Renaming Columns
 
-Renaming columns requires downtime as running GitLab instances will continue
-using the old column name until a new version is deployed. This can result
-in the instance producing errors, which in turn can impact the user experience.
+Renaming columns the normal way requires downtime as an application may continue
+using the old column name during/after a database migration. To rename a column
+without requiring downtime we need two migrations: a regular migration, and a
+post-deployment migration. Both these migration can go in the same release.
 
-## Changing Column Constraints
+### Step 1: Add The Regular Migration
+
+First we need to create the regular migration. This migration should use
+`Gitlab::Database::MigrationHelpers#rename_column_concurrently` to perform the
+renaming. For example
+
+```ruby
+# A regular migration in db/migrate
+class RenameUsersUpdatedAtToUpdatedAtTimestamp < ActiveRecord::Migration
+  include Gitlab::Database::MigrationHelpers
+
+  disable_ddl_transaction!
+
+  def up
+    rename_column_concurrently :users, :updated_at, :updated_at_timestamp
+  end
+
+  def down
+    cleanup_concurrent_column_rename :users, :updated_at_timestamp, :updated_at
+  end
+end
+```
+
+This will take care of renaming the column, ensuring data stays in sync, copying
+over indexes and foreign keys, etc.
+
+**NOTE:** if a column contains 1 or more indexes that do not contain the name of
+the original column, the above procedure will fail. In this case you will first
+need to rename these indexes.
 
-Generally changing column constraints requires checking all rows in the table to
-see if they meet the new constraint, unless a constraint is _removed_. For
-example, changing a column that previously allowed NULL values to not allow NULL
-values requires the database to verify all existing rows.
+### Step 2: Add A Post-Deployment Migration
 
-The specific behaviour varies a bit between databases but in general the safest
-approach is to assume changing constraints requires downtime.
+The renaming procedure requires some cleaning up in a post-deployment migration.
+We can perform this cleanup using
+`Gitlab::Database::MigrationHelpers#cleanup_concurrent_column_rename`:
+
+```ruby
+# A post-deployment migration in db/post_migrate
+class CleanupUsersUpdatedAtRename < ActiveRecord::Migration
+  include Gitlab::Database::MigrationHelpers
+
+  disable_ddl_transaction!
+
+  def up
+    cleanup_concurrent_column_rename :users, :updated_at, :updated_at_timestamp
+  end
+
+  def down
+    rename_column_concurrently :users, :updated_at_timestamp, :updated_at
+  end
+end
+```
+
+## Changing Column Constraints
+
+Adding or removing a NOT NULL clause (or another constraint) can typically be
+done without requiring downtime. However, this does require that any application
+changes are deployed _first_. Thus, changing the constraints of a column should
+happen in a post-deployment migration.
 
 ## Changing Column Types
 
-This operation requires downtime.
+Changing the type of a column can be done using
+`Gitlab::Database::MigrationHelpers#change_column_type_concurrently`. This
+method works similarly to `rename_column_concurrently`. For example, let's say
+we want to change the type of `users.username` from `string` to `text`.
+
+### Step 1: Create A Regular Migration
+
+A regular migration is used to create a new column with a temporary name along
+with setting up some triggers to keep data in sync. Such a migration would look
+as follows:
+
+```ruby
+# A regular migration in db/migrate
+class ChangeUsersUsernameStringToText < ActiveRecord::Migration
+  include Gitlab::Database::MigrationHelpers
+
+  disable_ddl_transaction!
+
+  def up
+    change_column_type_concurrently :users, :username, :text
+  end
+
+  def down
+    cleanup_concurrent_column_type_change :users, :username
+  end
+end
+```
+
+### Step 2: Create A Post Deployment Migration
+
+Next we need to clean up our changes using a post-deployment migration:
+
+```ruby
+# A post-deployment migration in db/post_migrate
+class ChangeUsersUsernameStringToTextCleanup < ActiveRecord::Migration
+  include Gitlab::Database::MigrationHelpers
+
+  disable_ddl_transaction!
+
+  def up
+    cleanup_concurrent_column_type_change :users
+  end
+
+  def down
+    change_column_type_concurrently :users, :username, :string
+  end
+end
+```
+
+And that's it, we're done!
 
 ## Adding Indexes
 
@@ -101,12 +208,19 @@ Migrations can take advantage of this by using the method
 
 ```ruby
 class MyMigration < ActiveRecord::Migration
-  def change
+  def up
     add_concurrent_index :projects, :column_name
   end
+
+  def down
+    remove_index(:projects, :column_name) if index_exists?(:projects, :column_name)
+  end
 end
 ```
 
+Note that `add_concurrent_index` can not be reversed automatically, thus you
+need to manually define `up` and `down`.
+
 When running this on PostgreSQL the `CONCURRENTLY` option mentioned above is
 used. On MySQL this method produces a regular `CREATE INDEX` query.
 
@@ -125,43 +239,54 @@ This operation is safe as there's no code using the table just yet.
 
 ## Dropping Tables
 
-This operation requires downtime as application code may still be using the
-table.
+Dropping tables can be done safely using a post-deployment migration, but only
+if the application no longer uses the table.
 
 ## Adding Foreign Keys
 
-Adding foreign keys acquires an exclusive lock on both the source and target
-tables in PostgreSQL. This requires downtime as otherwise the entire application
-grinds to a halt for the duration of the operation.
+Adding foreign keys usually works in 3 steps:
+
+1. Start a transaction
+1. Run `ALTER TABLE` to add the constraint(s)
+1. Check all existing data
 
-On MySQL this operation also requires downtime _unless_ foreign key checks are
-disabled. Because this means checks aren't enforced this is not ideal, as such
-one should assume MySQL also requires downtime.
+Because `ALTER TABLE` typically acquires an exclusive lock until the end of a
+transaction this means this approach would require downtime.
+
+GitLab allows you to work around this by using
+`Gitlab::Database::MigrationHelpers#add_concurrent_foreign_key`. This method
+ensures that when PostgreSQL is used no downtime is needed.
 
 ## Removing Foreign Keys
 
-This operation should not require downtime on both PostgreSQL and MySQL.
+This operation does not require downtime.
 
-## Updating Data
+## Data Migrations
 
-Updating data should generally be safe. The exception to this is data that's
-being migrated from one version to another while the application still produces
-data in the old version.
+Data migrations can be tricky. The usual approach to migrate data is to take a 3
+step approach:
 
-For example, imagine the application writes the string `'dog'` to a column but
-it really is meant to write `'cat'` instead. One might think that the following
-migration is all that is needed to solve this problem:
+1. Migrate the initial batch of data
+1. Deploy the application code
+1. Migrate any remaining data
 
-```ruby
-class MyMigration < ActiveRecord::Migration
-  def up
-    execute("UPDATE some_table SET column = 'cat' WHERE column = 'dog';")
-  end
-end
-```
+Usually this works, but not always. For example, if a field's format is to be
+changed from JSON to something else we have a bit of a problem. If we were to
+change existing data before deploying application code we'll most likely run
+into errors. On the other hand, if we were to migrate after deploying the
+application code we could run into the same problems.
+
+If you merely need to correct some invalid data, then a post-deployment
+migration is usually enough. If you need to change the format of data (e.g. from
+JSON to something else) it's typically best to add a new column for the new data
+format, and have the application use that. In such a case the procedure would
+be:
 
-Unfortunately this is not enough. Because the application is still running and
-using the old value this may result in the table still containing rows where
-`column` is set to `dog`, even after the migration finished.
+1. Add a new column in the new format
+1. Copy over existing data to this new column
+1. Deploy the application code
+1. In a post-deployment migration, copy over any remaining data
 
-In these cases downtime _is_ required, even for rarely updated tables.
+In general there is no one-size-fits-all solution, therefore it's best to
+discuss these kind of migrations in a merge request to make sure they are
+implemented in the best way possible.
author	Yorick Peterse <yorickpeterse@gmail.com>	2017-04-12 18:15:19 +0200
committer	Yorick Peterse <yorickpeterse@gmail.com>	2017-04-12 18:15:19 +0200
commit	223d8a3d26a7561fcae9536efbf120d7c4760bd4 (patch)
tree	e87582544e973da9ddb6fba97b2958bbc0508751 /doc/development/what_requires_downtime.md
parent	a179c5ca412ebf1fbe7432c654f4bea6d155233b (diff)
download	gitlab-ce-223d8a3d26a7561fcae9536efbf120d7c4760bd4.tar.gz