diff options
Diffstat (limited to 'doc/development/database/adding_database_indexes.md')
-rw-r--r-- | doc/development/database/adding_database_indexes.md | 63 |
1 files changed, 57 insertions, 6 deletions
diff --git a/doc/development/database/adding_database_indexes.md b/doc/development/database/adding_database_indexes.md index e1d5a7af6d9..1e3a1de9b69 100644 --- a/doc/development/database/adding_database_indexes.md +++ b/doc/development/database/adding_database_indexes.md @@ -214,6 +214,45 @@ def down end ``` +## Analyzing a new index before a batched background migration + +Sometimes it is necessary to add an index to support a [batched background migration](batched_background_migrations.md). +It is commonly done by creating two [post deployment migrations](post_deployment_migrations.md): + +1. Add the new index, often a [temporary index](#temporary-indexes). +1. [Queue the batched background migration](batched_background_migrations.md#queueing). + +In most cases, no additional work is needed. The new index is created and is used +as expected when queuing and executing the batched background migration. + +[Expression indexes](https://www.postgresql.org/docs/current/indexes-expressional.html), +however, do not generate statistics for the new index on creation. Autovacuum +eventually runs `ANALYZE`, and updates the statistics so the new index is used. +Run `ANALYZE` explicitly only if it is needed right after the index +is created, such as in the background migration scenario described above. + +To trigger `ANALYZE` after the index is created, update the index creation migration +to analyze the table: + +```ruby +# in db/post_migrate/ + +INDEX_NAME = 'tmp_index_projects_on_owner_and_lower_name_where_emails_disabled' +TABLE = :projects + +disable_ddl_transaction! + +def up + add_concurrent_index TABLE, '(creator_id, lower(name))', where: 'emails_disabled = false', name: INDEX_NAME + + connection.execute("ANALYZE #{TABLE}") +end +``` + +`ANALYZE` should only be run in post deployment migrations and should not target +[large tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3). +If this behavior is needed on a larger table, ask for assistance in the `#database` Slack channel. + ## Indexes for partitioned tables Indexes [cannot be created](https://www.postgresql.org/docs/15/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-MAINTENANCE) @@ -254,7 +293,7 @@ end For very large tables, index creation can be a challenge to manage. While `add_concurrent_index` creates indexes in a way that does not block -normal traffic, it can still be problematic when index creation runs for +ordinary traffic, it can still be problematic when index creation runs for many hours. Necessary database operations like `autovacuum` cannot run, and on GitLab.com, the deployment process is blocked waiting for index creation to finish. @@ -271,8 +310,13 @@ index creation can proceed at a lower level of risk. ### Schedule the index to be created -Create an MR with a post-deployment migration which prepares the index -for asynchronous creation. An example of creating an index using +1. Create a merge request containing a post-deployment migration, which prepares + the index for asynchronous creation. +1. [Create a follow-up issue](https://gitlab.com/gitlab-org/gitlab/-/issues/new?issuable_template=Synchronous%20Database%20Index) + to add a migration that creates the index synchronously. +1. In the merge request that prepares the asynchronous index, add a comment mentioning the follow-up issue. + +An example of creating an index using the asynchronous index helpers can be seen in the block below. This migration enters the index name and definition into the `postgres_async_indexes` table. The process that runs on weekends pulls indexes from this @@ -283,6 +327,7 @@ table and attempt to create them. INDEX_NAME = 'index_ci_builds_on_some_column' +# TODO: Index to be created synchronously in https://gitlab.com/gitlab-org/gitlab/-/issues/XXXXX def up prepare_async_index :ci_builds, :some_column, name: INDEX_NAME end @@ -351,7 +396,7 @@ Use the asynchronous index helpers on your local environment to test changes for For very large tables, index destruction can be a challenge to manage. While `remove_concurrent_index` removes indexes in a way that does not block -normal traffic, it can still be problematic if index destruction runs for +ordinary traffic, it can still be problematic if index destruction runs for during `autovacuum`. Necessary database operations like `autovacuum` cannot run, and the deployment process on GitLab.com is blocked while waiting for index destruction to finish. @@ -366,8 +411,13 @@ index destruction can proceed at a lower level of risk. ### Schedule the index to be removed -Create an MR with a post-deployment migration which prepares the index -for asynchronous destruction. For example. to destroy an index using +1. Create a merge request containing a post-deployment migration, which prepares + the index for asynchronous destruction. +1. [Create a follow-up issue](https://gitlab.com/gitlab-org/gitlab/-/issues/new?issuable_template=Synchronous%20Database%20Index) + to add a migration that destroys the index synchronously. +1. In the merge request that prepares the asynchronous index removal, add a comment mentioning the follow-up issue. + +For example, to destroy an index using the asynchronous index helpers: ```ruby @@ -375,6 +425,7 @@ the asynchronous index helpers: INDEX_NAME = 'index_ci_builds_on_some_column' +# TODO: Index to be destroyed synchronously in https://gitlab.com/gitlab-org/gitlab/-/issues/XXXXX def up prepare_async_index_removal :ci_builds, :some_column, name: INDEX_NAME end |