summaryrefslogtreecommitdiff
path: root/doc/development/database/not_null_constraints.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database/not_null_constraints.md')
-rw-r--r--doc/development/database/not_null_constraints.md217
1 files changed, 217 insertions, 0 deletions
diff --git a/doc/development/database/not_null_constraints.md b/doc/development/database/not_null_constraints.md
new file mode 100644
index 00000000000..e4dec2afa10
--- /dev/null
+++ b/doc/development/database/not_null_constraints.md
@@ -0,0 +1,217 @@
+# `NOT NULL` constraints
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0.
+
+All attributes that should not have `NULL` as a value, should be defined as `NOT NULL`
+columns in the database.
+
+Depending on the application logic, `NOT NULL` columns should either have a `presence: true`
+validation defined in their Model or have a default value as part of their database definition.
+As an example, the latter can be true for boolean attributes that should always have a non-`NULL`
+value, but have a well defined default value that the application does not need to enforce each
+time (for example, `active=true`).
+
+## Create a new table with `NOT NULL` columns
+
+When adding a new table, all `NOT NULL` columns should be defined as such directly inside `create_table`.
+
+For example, consider a migration that creates a table with two `NOT NULL` columns,
+`db/migrate/20200401000001_create_db_guides.rb`:
+
+```ruby
+class CreateDbGuides < ActiveRecord::Migration[6.0]
+ DOWNTIME = false
+
+ def change
+ create_table :db_guides do |t|
+ t.bigint :stars, default: 0, null: false
+ t.bigint :guide, null: false
+ end
+ end
+end
+```
+
+## Add a `NOT NULL` column to an existing table
+
+With PostgreSQL 11 being the minimum version since GitLab 13.0, adding columns with `NULL` and/or
+default values has become much easier and the standard `add_column` helper should be used in all cases.
+
+For example, consider a migration that adds a new `NOT NULL` column `active` to table `db_guides`,
+`db/migrate/20200501000001_add_active_to_db_guides.rb`:
+
+```ruby
+class AddExtendedTitleToSprints < ActiveRecord::Migration[6.0]
+ DOWNTIME = false
+
+ def change
+ add_column :db_guides, :active, :boolean, default: true, null: false
+ end
+end
+```
+
+## Add a `NOT NULL` constraint to an existing column
+
+Adding `NOT NULL` to existing database columns requires multiple steps split into at least two
+different releases:
+
+1. Release `N.M` (current release)
+
+ - Ensure the constraint is enforced at the application level (i.e. add a model validation).
+ - Add a post-deployment migration to add the `NOT NULL` constraint with `validate: false`.
+ - Add a post-deployment migration to fix the existing records.
+
+ NOTE: **Note:**
+ Depending on the size of the table, a background migration for cleanup could be required in the next release.
+ See the [`NOT NULL` constraints on large tables](not_null_constraints.md#not-null-constraints-on-large-tables) section for more information.
+
+ - Create an issue for the next milestone to validate the `NOT NULL` constraint.
+
+1. Release `N.M+1` (next release)
+
+ - Validate the `NOT NULL` constraint using a post-deployment migration.
+
+### Example
+
+Considering a given release milestone, such as 13.0, a model validation has been added into `epic.rb`
+to require a description:
+
+```ruby
+class Epic < ApplicationRecord
+ validates :description, presence: true
+end
+```
+
+The same constraint should be added at the database level for consistency purposes.
+We only want to enforce the `NOT NULL` constraint without setting a default, as we have decided
+that all epics should have a user-generated description.
+
+After checking our production database, we know that there are `epics` with `NULL` descriptions,
+so we can not add and validate the constraint in one step.
+
+NOTE: **Note:**
+Even if we did not have any epic with a `NULL` description, another instance of GitLab could have
+such records, so we would follow the same process either way.
+
+#### Prevent new invalid records (current release)
+
+We first add the `NOT NULL` constraint with a `NOT VALID` parameter, which enforces consistency
+when new records are inserted or current records are updated.
+
+In the example above, the existing epics with a `NULL` description will not be affected and you'll
+still be able to update records in the `epics` table. However, when you try to update or insert
+an epic without providing a description, the constraint causes a database error.
+
+Adding or removing a `NOT NULL` clause requires that any application changes are deployed _first_.
+Thus, adding a `NOT NULL` constraint to an existing column should happen in a post-deployment migration.
+
+Still in our example, for the 13.0 milestone example (current), we add the `NOT NULL` constraint
+with `validate: false` in a post-deployment migration,
+`db/post_migrate/20200501000001_add_not_null_constraint_to_epics_description.rb`:
+
+```ruby
+class AddNotNullConstraintToEpicsDescription < ActiveRecord::Migration[6.0]
+ include Gitlab::Database::MigrationHelpers
+ DOWNTIME = false
+
+ disable_ddl_transaction!
+
+ def up
+ # This will add the `NOT NULL` constraint WITHOUT validating it
+ add_not_null_constraint :epics, :description, validate: false
+ end
+
+ def down
+ # Down is required as `add_not_null_constraint` is not reversible
+ remove_not_null_constraint :epics, :description
+ end
+end
+```
+
+#### Data migration to fix existing records (current release)
+
+The approach here depends on the data volume and the cleanup strategy. The number of records that
+must be fixed on GitLab.com is a nice indicator that will help us decide whether to use a
+post-deployment migration or a background data migration:
+
+- If the data volume is less than `1000` records, then the data migration can be executed within the post-migration.
+- If the data volume is higher than `1000` records, it's advised to create a background migration.
+
+When unsure about which option to use, please contact the Database team for advice.
+
+Back to our example, the epics table is not considerably large nor frequently accessed,
+so we are going to add a post-deployment migration for the 13.0 milestone (current),
+`db/post_migrate/20200501000002_cleanup_epics_with_null_description.rb`:
+
+```ruby
+class CleanupEpicsWithNullDescription < ActiveRecord::Migration[6.0]
+ include Gitlab::Database::MigrationHelpers
+
+ # With BATCH_SIZE=1000 and epics.count=29500 on GitLab.com
+ # - 30 iterations will be run
+ # - each requires on average ~150ms
+ # Expected total run time: ~5 seconds
+ BATCH_SIZE = 1000
+
+ disable_ddl_transaction!
+
+ class Epic < ActiveRecord::Base
+ include EachBatch
+
+ self.table_name = 'epics'
+ end
+
+ def up
+ Epic.each_batch(of: BATCH_SIZE) do |relation|
+ relation.
+ where('description IS NULL').
+ update_all(description: 'No description')
+ end
+ end
+
+ def down
+ # no-op : can't go back to `NULL` without first dropping the `NOT NULL` constraint
+ end
+end
+```
+
+#### Validate the text limit (next release)
+
+Validating the `NOT NULL` constraint will scan the whole table and make sure that each record is correct.
+
+Still in our example, for the 13.1 milestone (next), we run the `validate_not_null_constraint`
+migration helper in a final post-deployment migration,
+`db/post_migrate/20200601000001_validate_not_null_constraint_on_epics_description.rb`:
+
+```ruby
+class ValidateNotNullConstraintOnEpicsDescription < ActiveRecord::Migration[6.0]
+ include Gitlab::Database::MigrationHelpers
+ DOWNTIME = false
+
+ disable_ddl_transaction!
+
+ def up
+ validate_not_null_constraint :epics, :description
+ end
+
+ def down
+ # no-op
+ end
+end
+```
+
+## `NOT NULL` constraints on large tables
+
+If you have to clean up a text column for a really [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/migration_helpers.rb#L12)
+(for example, the `artifacts` in `ci_builds`), your background migration will go on for a while and
+it will need an additional [background migration cleaning up](../background_migrations.md#cleaning-up)
+in the release after adding the data migration.
+
+In that rare case you will need 3 releases end-to-end:
+
+1. Release `N.M` - Add the `NOT NULL` constraint and the background-migration to fix the existing records.
+1. Release `N.M+1` - Cleanup the background migration.
+1. Release `N.M+2` - Validate the `NOT NULL` constraint.
+
+For these cases, please consult the database team early in the update cycle. The `NOT NULL`
+constraint may not be required or other options could exist that do not affect really large
+or frequently accessed tables.