summaryrefslogtreecommitdiff
path: root/doc/development/database/add_foreign_key_to_existing_column.md
blob: d74f826cc147ad489c6552dce5819a599a6d9d66 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
stage: Enablement
group: Database
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---

# Add a foreign key constraint to an existing column

Foreign keys ensure consistency between related database tables. The current database review process **always** encourages you to add [foreign keys](../foreign_keys.md) when creating tables that reference records from other tables.

Starting with Rails version 4, Rails includes migration helpers to add foreign key constraints
to database tables. Before Rails 4, the only way for ensuring some level of consistency was the
[`dependent`](https://guides.rubyonrails.org/association_basics.html#options-for-belongs-to-dependent)
option in the association definition. Ensuring data consistency on the application level could fail
in some unfortunate cases, so we might end up with inconsistent data in the table. This mostly affects
older tables, where we didn't have the framework support to ensure consistency on the database level.
These data inconsistencies can cause unexpected application behavior or bugs.

Adding a foreign key to an existing database column requires database structure changes and potential data changes. In case the table is in use, we should always assume that there is inconsistent data.

To add a foreign key constraint to an existing column:

1. GitLab version `N.M`: Add a `NOT VALID` foreign key constraint to the column to ensure GitLab doesn't create inconsistent records.
1. GitLab version `N.M`: Add a data migration, to fix or clean up existing records.
1. GitLab version `N.M+1`: Validate the whole table by making the foreign key `VALID`.

## Example

Consider the following table structures:

`users` table:

- `id` (integer, primary key)
- `name` (string)

`emails` table:

- `id` (integer, primary key)
- `user_id` (integer)
- `email` (string)

Express the relationship in `ActiveRecord`:

```ruby
class User < ActiveRecord::Base
  has_many :emails
end

class Email < ActiveRecord::Base
  belongs_to :user
end
```

Problem: when the user is removed, the email records related to the removed user stays in the `emails` table:

```ruby
user = User.find(1)
user.destroy

emails = Email.where(user_id: 1) # returns emails for the deleted user
```

### Prevent invalid records

Add a `NOT VALID` foreign key constraint to the table, which enforces consistency on the record changes.

[Using the `with_lock_retries` helper method is advised when performing operations on high-traffic tables](../migration_style_guide.md#when-to-use-the-helper-method),
in this case, if the table or the foreign table is a high-traffic table, we should use the helper method.

In the example above, you'd be still able to update records in the `emails` table. However, when you'd try to update the `user_id` with non-existent value, the constraint causes a database error.

Migration file for adding `NOT VALID` foreign key:

```ruby
class AddNotValidForeignKeyToEmailsUser < Gitlab::Database::Migration[1.0]
  def up
    # safe to use: it requires short lock on the table since we don't validate the foreign key
    add_foreign_key :emails, :users, on_delete: :cascade, validate: false
  end

  def down
    remove_foreign_key_if_exists :emails, column: :user_id
  end
end
```

WARNING:
Avoid using the `add_foreign_key` constraint more than once per migration file, unless the source and target tables are identical.

#### Data migration to fix existing records

The approach here depends on the data volume and the cleanup strategy. If we can find "invalid"
records by doing a database query and the record count is not high, then the data migration can
be executed in a Rails migration.

In case the data volume is higher (>1000 records), it's better to create a background migration. If unsure, please contact the database team for advice.

Example for cleaning up records in the `emails` table in a database migration:

```ruby
class RemoveRecordsWithoutUserFromEmailsTable < Gitlab::Database::Migration[1.0]
  disable_ddl_transaction!

  class Email < ActiveRecord::Base
    include EachBatch
  end

  def up
    Email.where('user_id NOT IN (SELECT id FROM users)').each_batch do |relation|
      relation.delete_all
    end
  end

  def down
    # Can be a no-op when data inconsistency is not affecting the pre and post deployment version of the application.
    # In this case we might have records in the `emails` table where the associated record in the `users` table is not there anymore.
  end
end
```

### Validate the foreign key

Validating the foreign key scans the whole table and makes sure that each relation is correct.

NOTE:
When using [background migrations](../background_migrations.md), foreign key validation should happen in the next GitLab release.

Migration file for validating the foreign key:

```ruby
# frozen_string_literal: true

class ValidateForeignKeyOnEmailUsers < Gitlab::Database::Migration[1.0]
  def up
    validate_foreign_key :emails, :user_id
  end

  def down
    # Can be safely a no-op if we don't roll back the inconsistent data.
  end
end
```