summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSean McGivern <sean@gitlab.com>2017-11-17 16:02:10 +0000
committerSean McGivern <sean@gitlab.com>2017-11-17 16:02:10 +0000
commit634031222c309c3445909f3b95349d608b7bdf93 (patch)
tree72630b4a53ecbb91b3f70b7007dc38bbaab968cc
parenta4aeda59a80a4fdcd60a08146c3a936641ec5002 (diff)
downloadgitlab-ce-restrict-update-column-in-batches-for-large-tables.tar.gz
Add computed update docs for update_column_in_batchesrestrict-update-column-in-batches-for-large-tables
-rw-r--r--doc/development/migration_style_guide.md38
-rw-r--r--lib/gitlab/database/migration_helpers.rb9
2 files changed, 46 insertions, 1 deletions
diff --git a/doc/development/migration_style_guide.md b/doc/development/migration_style_guide.md
index 9b8ab5da74e..a235dd74909 100644
--- a/doc/development/migration_style_guide.md
+++ b/doc/development/migration_style_guide.md
@@ -198,7 +198,43 @@ end
Keep in mind that this operation can easily take 10-15 minutes to complete on
larger installations (e.g. GitLab.com). As a result you should only add default
-values if absolutely necessary.
+values if absolutely necessary. There is a RuboCop cop that will fail if this
+method is used on some tables that are very large on GitLab.com, which would
+cause other issues.
+
+## Updating an existing column
+
+To update an existing column to a particular value, you can use
+`update_column_in_batches` (`add_column_with_default` uses this internally to
+fill in the default value). This will split the updates into batches, so we
+don't update too many rows at in a single statement.
+
+This updates the column `foo` in the `projects` table to 10, where `some_column`
+is `'hello'`:
+
+```ruby
+update_column_in_batches(:projects, :foo, 10) do |table, query|
+ query.where(table[:some_column].eq('hello'))
+end
+```
+
+To perform a computed update, the value can be wrapped in `Arel.sql`, so Arel
+treats it as an SQL literal. The below example is the same as the one above, but
+the value is set to the product of the `bar` and `baz` columns:
+
+```ruby
+update_value = Arel.sql('bar * baz')
+
+update_column_in_batches(:projects, :foo, update_value) do |table, query|
+ query.where(table[:some_column].eq('hello'))
+end
+```
+
+Like `add_column_with_default`, there is a RuboCop cop to detect usage of this
+on large tables. In the case of `update_column_in_batches`, it may be acceptable
+to run on a large table, as long as it is only updating a small subset of the
+rows in the table, but do not ignore that without validating on the GitLab.com
+staging environment - or asking someone else to do so for you - beforehand.
## Integer column type
diff --git a/lib/gitlab/database/migration_helpers.rb b/lib/gitlab/database/migration_helpers.rb
index 2c35da8f1aa..c276c3566b4 100644
--- a/lib/gitlab/database/migration_helpers.rb
+++ b/lib/gitlab/database/migration_helpers.rb
@@ -220,6 +220,15 @@ module Gitlab
# column - The name of the column to update.
# value - The value for the column.
#
+ # The `value` argument is typically a literal. To perform a computed
+ # update, an Arel literal can be used instead:
+ #
+ # update_value = Arel.sql('bar * baz')
+ #
+ # update_column_in_batches(:projects, :foo, update_value) do |table, query|
+ # query.where(table[:some_column].eq('hello'))
+ # end
+ #
# Rubocop's Metrics/AbcSize metric is disabled for this method as Rubocop
# determines this method to be too complex while there's no way to make it
# less "complex" without introducing extra methods (which actually will