summaryrefslogtreecommitdiff
path: root/doc/development/database/setting_multiple_values.md
blob: 5569a0e10b78f62e10423ff3edfc4b112db4cd5f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# Setting Multiple Values

> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/32921) in GitLab 13.5.

Frequently, we will want to update multiple objects with new values for one
or more columns. The obvious way to do this is using `Relation#update_all`:

```ruby
user.issues.open.update_all(due_date: 7.days.from_now) # (1)
user.issues.update_all('relative_position = relative_position + 1') # (2)
```

But what do you do if you cannot express the update as either a static value (1)
or as a calculation (2)?

Thankfully we can use `UPDATE FROM` to express the need to update multiple rows
with distinct values in a single query. One can either use a temporary table, or
a Common Table Expression (CTE), and then use that as the source of the updates:

```sql
with updates(obj_id, new_title, new_weight) as (
  values (1 :: integer, 'Very difficult issue' :: text, 8 :: integer),
         (2, 'Very easy issue', 1)
)
update issues
  set title = new_title, weight = new_weight
  from updates
  where id = obj_id
```

The bad news: There is no way to express this in ActiveRecord or even dropping
down to ARel - the `UpdateManager` just does not support `update from`, so this
is not expressible.

The good news: We supply an abstraction to help you generate these kinds of
updates, called `Gitlab::Database::BulkUpdate`. This constructs queries such as the
above, and uses binding parameters to avoid SQL injection.

## Usage

To use this, we need:

- the list of columns to update
- a mapping from object/ID to the new values to set for that object
- a way to determine the table for each object

So for example, we can express the query above as:

```ruby
issue_a = Issue.find(..)
issue_b = Issue.find(..)

# Issues a single query:
::Gitlab::Database::BulkUpdate.execute(%i[title weight], {
  issue_a => { title: 'Very difficult issue', weight: 8 },
  issue_b => { title: 'Very easy issue', weight: 1 }
})
```

Here the table can be determined automatically, from calling
`object.class.table_name`, so we don't need to provide anything.

We can even pass heterogeneous sets of objects, if the updates all make sense
for them:

```ruby
issue_a = Issue.find(..)
issue_b = Issue.find(..)
merge_request = MergeRequest.find(..)

# Issues two queries
::Gitlab::Database::BulkUpdate.execute(%i[title], {
  issue_a => { title: 'A' },
  issue_b => { title: 'B' },
  merge_request => { title: 'B' }
})
```

If your objects do not return the correct model class (perhaps because they are
part of a union), then we need to specify this explicitly in a block:

```ruby
bazzes = params
objects = Foo.from_union([
    Foo.select("id, 'foo' as object_type").where(quux: true),
    Bar.select("id, 'bar' as object_type").where(wibble: true)
    ])
# At this point, all the objects are instances of Foo, even the ones from the
# Bar table
mapping = objects.to_h { |obj| [obj, bazzes[obj.id] }

# Issues at most 2 queries
::Gitlab::Database::BulkUpdate.execute(%i[baz], mapping) do |obj|
  obj.object_type.constantize
end
```

## Caveats

Note that this is a **very low level** tool, and operates on the raw column
values. Enumerations and state fields must be translated into their underlying
representations, for example, and nested associations are not supported. No
validations or hooks will be called.