summaryrefslogtreecommitdiff
path: root/doc/development/database/setting_multiple_values.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database/setting_multiple_values.md')
-rw-r--r--doc/development/database/setting_multiple_values.md103
1 files changed, 103 insertions, 0 deletions
diff --git a/doc/development/database/setting_multiple_values.md b/doc/development/database/setting_multiple_values.md
new file mode 100644
index 00000000000..5569a0e10b7
--- /dev/null
+++ b/doc/development/database/setting_multiple_values.md
@@ -0,0 +1,103 @@
+# Setting Multiple Values
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/32921) in GitLab 13.5.
+
+Frequently, we will want to update multiple objects with new values for one
+or more columns. The obvious way to do this is using `Relation#update_all`:
+
+```ruby
+user.issues.open.update_all(due_date: 7.days.from_now) # (1)
+user.issues.update_all('relative_position = relative_position + 1') # (2)
+```
+
+But what do you do if you cannot express the update as either a static value (1)
+or as a calculation (2)?
+
+Thankfully we can use `UPDATE FROM` to express the need to update multiple rows
+with distinct values in a single query. One can either use a temporary table, or
+a Common Table Expression (CTE), and then use that as the source of the updates:
+
+```sql
+with updates(obj_id, new_title, new_weight) as (
+ values (1 :: integer, 'Very difficult issue' :: text, 8 :: integer),
+ (2, 'Very easy issue', 1)
+)
+update issues
+ set title = new_title, weight = new_weight
+ from updates
+ where id = obj_id
+```
+
+The bad news: There is no way to express this in ActiveRecord or even dropping
+down to ARel - the `UpdateManager` just does not support `update from`, so this
+is not expressible.
+
+The good news: We supply an abstraction to help you generate these kinds of
+updates, called `Gitlab::Database::BulkUpdate`. This constructs queries such as the
+above, and uses binding parameters to avoid SQL injection.
+
+## Usage
+
+To use this, we need:
+
+- the list of columns to update
+- a mapping from object/ID to the new values to set for that object
+- a way to determine the table for each object
+
+So for example, we can express the query above as:
+
+```ruby
+issue_a = Issue.find(..)
+issue_b = Issue.find(..)
+
+# Issues a single query:
+::Gitlab::Database::BulkUpdate.execute(%i[title weight], {
+ issue_a => { title: 'Very difficult issue', weight: 8 },
+ issue_b => { title: 'Very easy issue', weight: 1 }
+})
+```
+
+Here the table can be determined automatically, from calling
+`object.class.table_name`, so we don't need to provide anything.
+
+We can even pass heterogeneous sets of objects, if the updates all make sense
+for them:
+
+```ruby
+issue_a = Issue.find(..)
+issue_b = Issue.find(..)
+merge_request = MergeRequest.find(..)
+
+# Issues two queries
+::Gitlab::Database::BulkUpdate.execute(%i[title], {
+ issue_a => { title: 'A' },
+ issue_b => { title: 'B' },
+ merge_request => { title: 'B' }
+})
+```
+
+If your objects do not return the correct model class (perhaps because they are
+part of a union), then we need to specify this explicitly in a block:
+
+```ruby
+bazzes = params
+objects = Foo.from_union([
+ Foo.select("id, 'foo' as object_type").where(quux: true),
+ Bar.select("id, 'bar' as object_type").where(wibble: true)
+ ])
+# At this point, all the objects are instances of Foo, even the ones from the
+# Bar table
+mapping = objects.to_h { |obj| [obj, bazzes[obj.id] }
+
+# Issues at most 2 queries
+::Gitlab::Database::BulkUpdate.execute(%i[baz], mapping) do |obj|
+ obj.object_type.constantize
+end
+```
+
+## Caveats
+
+Note that this is a **very low level** tool, and operates on the raw column
+values. Enumerations and state fields must be translated into their underlying
+representations, for example, and nested associations are not supported. No
+validations or hooks will be called.