Add latest changes from gitlab-org/gitlab@15-11-stable-eev15.11.0-rc42

author: GitLab Bot <gitlab-bot@gitlab.com> 2023-04-20 11:43:17 +0000
committer: GitLab Bot <gitlab-bot@gitlab.com> 2023-04-20 11:43:17 +0000
commit: dfc94207fec2d84314b1a5410cface22e8b369bd (patch)
tree: c54022f61ced104305889a64de080998a0dc773b /doc/development/database
parent: b874efeff674f6bf0355d5d242ecf81c6f7155df (diff)
download: gitlab-ce-15.11.0-rc42.tar.gz
12 files changed, 369 insertions, 41 deletions
diff --git a/doc/development/database/adding_database_indexes.md b/doc/development/database/adding_database_indexes.md
index 1e3a1de9b69..7b29b1b14de 100644
--- a/doc/development/database/adding_database_indexes.md
+++ b/doc/development/database/adding_database_indexes.md
@@ -38,6 +38,15 @@ when adding a new index:
 1. Is the overhead of maintaining the index worth the reduction in query
    timings?
 
+In some situations, an index might not be required:
+
+- The table is small (less than `1,000` records) and it's not expected to exponentially grow in size.
+- Any existing indexes filter out enough rows.
+- The reduction in query timings after the index is added is not significant.
+
+Additionally, wide indexes are not required to match all filter criteria of queries. We just need
+to cover enough columns so that the index lookup has a small enough selectivity.
+
 ## Re-using Queries
 
 The first step is to make sure your query re-uses as many existing indexes as
@@ -183,6 +192,29 @@ for `index_exists?`, causing a required index to not be created
 properly. By always requiring a name for certain types of indexes, the
 chance of error is greatly reduced.
 
+## Testing for existence of indexes
+
+The easiest way to test for existence of an index by name is to use the `index_name_exists?` method, but the `index_exists?` method can also be used with a name option. For example:
+
+```ruby
+class MyMigration < Gitlab::Database::Migration[2.1]
+  INDEX_NAME = 'index_name'
+
+  def up
+    # an index must be conditionally created due to schema inconsistency
+    unless index_exists?(:table_name, :column_name, name: INDEX_NAME)
+      add_index :table_name, :column_name, name: INDEX_NAME
+    end
+  end
+
+  def down
+    # no op
+  end
+end
+```
+
+Keep in mind that concurrent index helpers like `add_concurrent_index`, `remove_concurrent_index`, and `remove_concurrent_index_by_name` already perform existence checks internally.
+
 ## Temporary indexes
 
 There may be times when an index is only needed temporarily.
@@ -448,7 +480,7 @@ You must test the database index changes locally before creating a merge request
    the post-deploy migration has been executed in the production database. For more information, see
    [How to determine if a post-deploy migration has been executed on GitLab.com](https://gitlab.com/gitlab-org/release/docs/-/blob/master/general/post_deploy_migration/readme.md#how-to-determine-if-a-post-deploy-migration-has-been-executed-on-gitlabcom).
 1. In the case of an [index removed asynchronously](#schedule-the-index-to-be-removed), wait
-   until the next week so that the index can be created over a weekend.
+   until the next week so that the index can be removed over a weekend.
 1. Use Database Lab [to check if removal was successful](database_lab.md#checking-indexes).
    [Database Lab](database_lab.md)
    should report an error when trying to find the removed index. If not, the index may still exist.
diff --git a/doc/development/database/avoiding_downtime_in_migrations.md b/doc/development/database/avoiding_downtime_in_migrations.md
index 8e1eeee7a42..25310554c24 100644
--- a/doc/development/database/avoiding_downtime_in_migrations.md
+++ b/doc/development/database/avoiding_downtime_in_migrations.md
@@ -605,7 +605,7 @@ See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_request
 ### Remove the trigger and old `integer` columns (release N + 2)
 
 Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper,
-drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)).
+drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/70351)).
 
 ### Remove ignore rules (release N + 3)
 
diff --git a/doc/development/database/batched_background_migrations.md b/doc/development/database/batched_background_migrations.md
index 67dccd99a6c..7f3c5889017 100644
--- a/doc/development/database/batched_background_migrations.md
+++ b/doc/development/database/batched_background_migrations.md
@@ -597,6 +597,37 @@ for more details.
    more pressure on DB than you expect. Measure on staging,
    or ask someone to measure on production.
 1. Know how much time is required to run the batched background migration.
+1. Be careful when silently rescuing exceptions inside job classes. This may lead to
+   jobs being marked as successful, even in a failure scenario.
+
+   ```ruby
+   # good
+   def perform
+     each_sub_batch do |sub_batch|
+       sub_batch.update_all(name: 'My Name')
+     end
+   end
+
+   # acceptable
+   def perform
+     each_sub_batch do |sub_batch|
+       sub_batch.update_all(name: 'My Name')
+     rescue Exception => error
+       logger.error(message: error.message, class: error.class)
+
+       raise
+     end
+   end
+
+   # bad
+   def perform
+     each_sub_batch do |sub_batch|
+       sub_batch.update_all(name: 'My Name')
+     rescue Exception => error
+       logger.error(message: error.message, class: self.class.name)
+     end
+   end
+   ```
 
 ## Additional tips and strategies
 
@@ -768,7 +799,7 @@ To do that, you need to:
 
 #### 1. Find the batch `start_id` and `end_id`
 
-You should be able to find those in [Kibana][#viewing-failure-error-logs].
+You should be able to find those in [Kibana](#viewing-failure-error-logs).
 
 #### 2. Create a regular migration
 
@@ -809,7 +840,7 @@ index b8d1d21a0d2d2a23d9e8c8a0a17db98ed1ed40b7..912e20659a6919f771045178c6682856
 +++ b/lib/gitlab/database/migration_helpers/restrict_gitlab_schema.rb
 @@ -55,7 +55,7 @@ def unmatched_schemas
          end
- 
+
          def allowed_schemas_for_connection
 -          Gitlab::Database.gitlab_schemas_for_connection(connection)
 +          Gitlab::Database.gitlab_schemas_for_connection(connection) << :gitlab_ci
@@ -828,7 +859,7 @@ index 4ae3622479f0800c0553959e132143ec9051898e..d556ec7f55adae9d46a56665ce02de78
 @@ -79,7 +79,7 @@ def restrict_to_dml_only(parsed)
              tables = self.dml_tables(parsed)
              schemas = self.dml_schemas(tables)
- 
+
 -            if (schemas - self.allowed_gitlab_schemas).any?
 +            if (schemas - (self.allowed_gitlab_schemas << :gitlab_ci)).any?
                raise DMLAccessDeniedError, \
diff --git a/doc/development/database/ci_mirrored_tables.md b/doc/development/database/ci_mirrored_tables.md
index bf3a744b936..1e37739bdc4 100644
--- a/doc/development/database/ci_mirrored_tables.md
+++ b/doc/development/database/ci_mirrored_tables.md
@@ -76,9 +76,8 @@ the source and the target tables in sync:
 1. Deleting namespaces/projects.
 
 ```mermaid
-graph TD
-
-    subgraph "CI database (tables)"
+graph LR
+    subgraph CI["CI Tables"]
         E[other CI tables]
         F{queries with joins allowed}
         G[ci_project_mirrors]
@@ -89,17 +88,18 @@ graph TD
         F---H
     end
 
-    A---B
-    B---C
-    B---D
+    Main["Main Tables"]---L["⛔ ← Joins are not allowed → ⛔"]
+    L---CI
 
-L["⛔ ← Joins are not allowed → ⛔"]
-
-    subgraph "Main database (tables)"
+    subgraph Main["Main Tables"]
         A[other main tables]
         B{queries with joins allowed}
         C[projects]
         D[namespaces]
+
+        A---B
+        B---C
+        B---D
     end
 ```
 
diff --git a/doc/development/database/index.md b/doc/development/database/index.md
index 2cb8509e203..f532e054849 100644
--- a/doc/development/database/index.md
+++ b/doc/development/database/index.md
@@ -64,6 +64,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 - [Hash indexes](hash_indexes.md)
 - [Insert into tables in batches](insert_into_tables_in_batches.md)
 - [Iterating tables in batches](iterating_tables_in_batches.md)
+- [Load balancing](load_balancing.md)
 - [`NOT NULL` constraints](not_null_constraints.md)
 - [Ordering table columns](ordering_table_columns.md)
 - [Pagination guidelines](pagination_guidelines.md)
diff --git a/doc/development/database/iterating_tables_in_batches.md b/doc/development/database/iterating_tables_in_batches.md
index 6357bed8b00..a927242e8d8 100644
--- a/doc/development/database/iterating_tables_in_batches.md
+++ b/doc/development/database/iterating_tables_in_batches.md
@@ -44,9 +44,13 @@ all of the arguments that `in_batches` supports. You should always use
 
 ## Iterating over non-unique columns
 
-One should proceed with extra caution. When you iterate over an attribute that is not unique,
-even with the applied max batch size, there is no guarantee that the resulting batches do not
-surpass it. The following snippet demonstrates this situation when one attempt to select
+You should not use the `each_batch` method with a non-unique column (in the context of the relation) as it
+[may result in an infinite loop](https://gitlab.com/gitlab-org/gitlab/-/issues/285097).
+Additionally, the inconsistent batch sizes cause performance issues when you
+iterate over non-unique columns. Even when you apply a max batch size
+when iterating over an attribute, there's no guarantee that the resulting
+batches don't surpass it. The following snippet demonstrates this situation
+when you attempt to select
 `Ci::Build` entries for users with `id` between `1` and `10,000`, the database returns
 `1 215 178` matching rows.
 
@@ -465,6 +469,58 @@ Issue.each_batch(of: 1000) do |relation|
 end
 ```
 
+### Counting records
+
+For tables with a large amount of data, counting records through queries can result
+in timeouts. The `EachBatch` module provides an alternative way to iteratively count
+records. The downside of using `each_batch` is the extra count query which is executed
+on the yielded relation object.
+
+The `each_batch_count` method is a more efficient approach that eliminates the need
+for the extra count query. By invoking this method, the iteration process can be
+paused and resumed as needed. This feature is particularly useful in situations
+where error budget violations are triggered after five minutes, such as when performing
+counting operations within Sidekiq workers.
+
+To illustrate, counting records using `EachBatch` involves invoking an additional
+count query as follows:
+
+```ruby
+count = 0
+
+Issue.each_batch do |relation|
+  count += relation.count
+end
+
+puts count
+```
+
+On the other hand, the `each_batch_count` method enables the counting process to be
+performed more efficiently (counting is part of the iteration query) without invoking
+an extra count query:
+
+```ruby
+count, _last_value = Issue.each_batch_count # last value can be ignored here
+```
+
+Furthermore, the `each_batch_count` method allows the counting process to be paused
+and resumed at any point. This capability is demonstrated in the following code snippet:
+
+```ruby
+stop_at = Time.current + 3.minutes
+
+count, last_value = Issue.each_batch_count do
+  Time.current > stop_at # condition for stopping the counting
+end
+
+# Continue the counting later
+stop_at = Time.current + 3.minutes
+
+count, last_value = Issue.each_batch_count(last_count: count, last_value: last_value) do
+  Time.current > stop_at
+end
+```
+
 ### `EachBatch` vs `BatchCount`
 
 When adding new counters for Service Ping, the preferred way to count records is using the
diff --git a/doc/development/database/load_balancing.md b/doc/development/database/load_balancing.md
new file mode 100644
index 00000000000..f623ad1eab0
--- /dev/null
+++ b/doc/development/database/load_balancing.md
@@ -0,0 +1,59 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
+---
+
+# Database load balancing
+
+With database load balancing, read-only queries can be distributed across multiple
+PostgreSQL nodes to increase performance.
+
+This documentation provides a technical overview on how database load balancing
+is implemented in GitLab Rails and Sidekiq.
+
+## Nomenclature
+
+1. **Host**: Each database host. It could be a primary or a replica.
+1. **Primary**: Primary PostgreSQL host that is used for write-only and read-and-write operations.
+1. **Replica**: Secondary PostgreSQL hosts that are used for read-only operations.
+1. **Workload**: a Rails request or a Sidekiq job that requires database connections.
+
+## Components
+
+F few Ruby classes are involved in the load balancing process. All of them are
+in the namespace `Gitlab::Database::LoadBalancing`:
+
+1. `Host`
+1. `LoadBalancer`
+1. `ConnectionProxy`
+1. `Session`
+
+Each workload begins with a new instance of `Gitlab::Database::LoadBalancing::Session`.
+The `Session` keeps track of the database operations that have been performed. It then
+determines if the workload requires a connection to either the primary host or a replica host.
+
+When the workload requires a database connection through `ActiveRecord`,
+`ConnectionProxy` first redirects the connection request to `LoadBalancer`.
+`ConnectionProxy` requests either a `read` or `read_write` connection from the `LoadBalancer`
+depending on a few criteria:
+
+1. Whether the query is a read-only or it requires write.
+1. Whether the `Session` has recorded a write operation previously.
+1. Whether any special blocks have been used to prefer primary or replica, such as:
+   - `use_primary`
+   - `ignore_writes`
+   - `use_replicas_for_read_queries`
+   - `fallback_to_replicas_for_ambiguous_queries`
+
+`LoadBalancer` then yields the requested connection from the respective database connection pool.
+It yields either:
+
+- A `read_write` connection from the primary's connection pool.
+- A `read` connection from the replicas' connection pools.
+
+When responding to a request for a `read` connection, `LoadBalancer` would
+first attempt to load balance the connection across the replica hosts.
+It looks for the next `online` replica host and yields a connection from the host's connection pool.
+A replica host is considered `online` if it is up-to-date with the primary, based on
+either the replication lag size or time. The thresholds for these requirements are configurable.
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index daa022a3de2..91a22d8c26b 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -64,7 +64,7 @@ The tool ensures that all aspects of swapping a foreign key are covered. This in
 
 - Creating a migration to remove a foreign key.
 - Updating `db/structure.sql` with the new migration.
-- Updating `lib/gitlab/database/gitlab_loose_foreign_keys.yml` to add the new loose foreign key.
+- Updating `config/gitlab_loose_foreign_keys.yml` to add the new loose foreign key.
 - Creating or updating a model's specs to ensure that the loose foreign key is properly supported.
 
 The tool is located at `scripts/decomposition/generate-loose-foreign-key`:
diff --git a/doc/development/database/multiple_databases.md b/doc/development/database/multiple_databases.md
index c1b30a4cbbe..6adfdc90cf2 100644
--- a/doc/development/database/multiple_databases.md
+++ b/doc/development/database/multiple_databases.md
@@ -545,7 +545,7 @@ end
 ```
 
 Don't hesitate to reach out to the
-[pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/pods/)
+[Pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/tenant-scale/)
 for advice.
 
 ##### Avoid `dependent: :nullify` and `dependent: :destroy` across databases
@@ -580,6 +580,24 @@ or records that point to nowhere, which might lead to bugs. As such we created
 ["loose foreign keys"](loose_foreign_keys.md) which is an asynchronous
 process of cleaning up orphaned records.
 
+## Testing for multiple databases
+
+In our testing CI pipelines, we test GitLab by default with multiple databases set up, using
+both `main` and `ci` databases. But in merge requests, for example when we modify some database-related code or
+add the label `~"pipeline:run-single-db"` to the MR, we additionally run our tests in
+[two other database modes](../pipelines/index.md#single-database-testing):
+`single-db` and `single-db-ci-connection`.
+
+To handle situations where our tests need to run in specific database modes, we have some RSpec helpers
+to limit the modes where tests can run, and skip them on any other modes.
+
+| Helper name                                 | Test runs |
+|---------------------------------------------| ---      |
+| `skip_if_shared_database(:ci)`              | On **multiple databases**   |
+| `skip_if_database_exists(:ci)`              | On **single-db** and **single-db-ci-connection**   |
+| `skip_if_multiple_databases_are_setup(:ci)` | Only on **single-db**   |
+| `skip_if_multiple_databases_not_setup(:ci)` | On **single-db-ci-connection** and **multiple databases** |
+
 ## Locking writes on the tables that don't belong to the database schemas
 
 When the CI database is promoted and the two databases are fully split,
diff --git a/doc/development/database/strings_and_the_text_data_type.md b/doc/development/database/strings_and_the_text_data_type.md
index 47e89c1ce0f..5cd325bfa56 100644
--- a/doc/development/database/strings_and_the_text_data_type.md
+++ b/doc/development/database/strings_and_the_text_data_type.md
@@ -68,9 +68,17 @@ is held for a brief amount of time, the time `add_column` needs to complete its
 depending on how frequently the table is accessed. For example, acquiring an exclusive lock for a very
 frequently accessed table may take minutes in GitLab.com and requires the use of `with_lock_retries`.
 
-For these reasons, it is advised to add the text limit on a separate migration than the `add_column` one.
+When adding a text limit, transactions must be disabled with `disable_ddl_transaction!`. This means adding the column is not rolled back
+in case the migration fails afterwards. An attempt to re-run the migration will raise an error because of the already existing column.
 
-For example, consider a migration that adds a new text column `extended_title` to table `sprints`,
+For these reasons, adding a text column to an existing table can be done by either:
+
+- [Add the column and limit in separate migrations.](#add-the-column-and-limit-in-separate-migrations)
+- [Add the column and limit in one migration with checking if the column already exists.](#add-the-column-and-limit-in-one-migration-with-checking-if-the-column-already-exists)
+
+### Add the column and limit in separate migrations
+
+Consider a migration that adds a new text column `extended_title` to table `sprints`,
 `db/migrate/20200501000001_add_extended_title_to_sprints.rb`:
 
 ```ruby
@@ -103,6 +111,33 @@ class AddTextLimitToSprintsExtendedTitle < Gitlab::Database::Migration[2.1]
 end
 ```
 
+### Add the column and limit in one migration with checking if the column already exists
+
+Consider a migration that adds a new text column `extended_title` to table `sprints`,
+`db/migrate/20200501000001_add_extended_title_to_sprints.rb`:
+
+```ruby
+class AddExtendedTitleToSprints < Gitlab::Database::Migration[2.1]
+  disable_ddl_transaction!
+
+  def up
+    with_lock_retries do
+      add_column :sprints, :extended_title, :text, if_not_exists: true
+    end
+
+    add_text_limit :sprints, :extended_title, 512
+  end
+
+  def down
+    remove_text_limit :sprints, :extended_title
+
+    with_lock_retries do
+      remove_column :sprints, :extended_title, if_exists: true
+    end
+  end
+end
+```
+
 ## Add a text limit constraint to an existing column
 
 Adding text limits to existing database columns requires multiple steps split into at least two different releases:
diff --git a/doc/development/database/table_partitioning.md b/doc/development/database/table_partitioning.md
index 0d5e3c233f6..81733b126b6 100644
--- a/doc/development/database/table_partitioning.md
+++ b/doc/development/database/table_partitioning.md
@@ -6,6 +6,13 @@ info: To determine the technical writer assigned to the Stage/Group associated w
 
 # Database table partitioning
 
+WARNING:
+If you have questions not answered below, check for and add them
+to [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/398650).
+Tag `@gitlab-org/database-team/triage` and we'll get back to you with an
+answer as soon as possible. If you get an answer in Slack, document
+it on the issue as well so we can update this document in the future.
+
 Table partitioning is a powerful database feature that allows a table's
 data to be split into smaller physical tables that act as a single large
 table. If the application is designed to work with partitioning in mind,
@@ -32,31 +39,38 @@ several releases. Due to the limitations of partitioning and the related
 migrations, you should understand how partitioning fits your use case
 before attempting to leverage this feature.
 
-## Determining when to use partitioning
+## Determine when to use partitioning
 
 While partitioning can be very useful when properly applied, it's
 imperative to identify if the data and workload of a table naturally fit a
-partitioning scheme. There are a few details you have to understand
-to decide if partitioning is a good fit for your particular
-problem.
-
-First, a table is partitioned on a partition key, which is a column or
-set of columns which determine how the data is split across the
-partitions. The partition key is used by the database when reading or
-writing data, to decide which partitions must be accessed. The
-partition key should be a column that would be included in a `WHERE`
-clause on almost all queries accessing that table.
-
-Second, it's necessary to understand the strategy the database uses
-to split the data across the partitions. The scheme supported by the
-GitLab migration helpers is date-range partitioning, where each partition
-in the table contains data for a single month. In this case, the partitioning
-key must be a timestamp or date column. In order for this type of
+partitioning scheme. Understand a few details to decide if partitioning
+is a good fit for your particular problem:
+
+- **Table partitioning**. A table is partitioned on a partition key, which is a
+  column or set of columns which determine how the data is split across the
+  partitions. The partition key is used by the database when reading or
+  writing data, to decide which partitions must be accessed. The
+  partition key should be a column that would be included in a `WHERE`
+  clause on almost all queries accessing that table.
+
+- **How the data is split**. What strategy does the database use
+  to split the data across the partitions? The available choices are `range`,
+  `hash`, and `list`.
+
+## Determine the appropriate partitioning strategy
+
+The available partitioning strategy choices are `range`, `hash`, and `list`.
+
+### Range partitioning
+
+The scheme best supported by the GitLab migration helpers is date-range partitioning,
+where each partition in the table contains data for a single month. In this case,
+the partitioning key must be a timestamp or date column. For this type of
 partitioning to work well, most queries must access data in a
 certain date range.
 
-For a more concrete example, the `audit_events` table can be used, which
-was the first table to be partitioned in the application database
+For a more concrete example, consider using the `audit_events` table.
+It was the first table to be partitioned in the application database
 (scheduled for deployment with the GitLab 13.5 release). This
 table tracks audit entries of security events that happen in the
 application. In almost all cases, users want to see audit activity that
@@ -142,6 +156,31 @@ substantial. Partitioning should only be leveraged if the access patterns
 of the data support the partitioning strategy, otherwise performance
 suffers.
 
+### Hash Partitioning
+
+Hash partitioning splits a logical table into a series of partitioned
+tables. Each partition corresponds to the ID range that matches
+a hash and remainder. For example, if partitioning `BY HASH(id)`, rows
+with `hash(id) % 64 == 1` would end up in the partition
+`WITH (MODULUS 64, REMAINDER 1)`.
+
+When hash partitioning, you must include a `WHERE hashed_column = ?` condition in
+every performance-sensitive query issued by the application. If this is not possible,
+hash partitioning may not be the correct fit for your use case.
+
+Hash partitioning has one main advantage: it is the only type of partitioning that
+can enforce uniqueness on a single numeric `id` column. (While also possible with
+range partitioning, it's rarely the correct choice).
+
+Hash partitioning has downsides:
+
+- The number of partitions must be known up-front.
+- It's difficult to move new data to an extra partition if current partitions become too large.
+- Range queries, such as `WHERE id BETWEEN ? and ?`, are unsupported.
+- Lookups by other keys, such as `WHERE other_id = ?`, are unsupported.
+
+For this reason, it's often best to choose a large number of hash partitions to accommodate future table growth.
+
 ## Partitioning a table (Range)
 
 Unfortunately, tables can only be partitioned at their creation, making
@@ -257,6 +296,18 @@ for use by the application. This section will be updated when the
 migration helper is ready, for now development can be followed in the
 [Tracking Issue](https://gitlab.com/gitlab-org/gitlab/-/issues/241267).
 
+## Partitioning a table (Hash)
+
+Hash partitioning divides data into partitions based on a hash of their ID.
+It works well only if most queries against the table include a clause like `WHERE id = ?`,
+so that PostgreSQL can decide which partition to look in based on the ID or ids being requested.
+
+Another key downside is that hash partitioning does not allow adding additional partitions after table creation.
+The correct number of partitions must be chosen up-front.
+
+Hash partitioning is the only type of partitioning (aside from some complex uses of list partitioning) that can guarantee
+uniqueness of an ID across multiple partitions at the database level.
+
 ## Partitioning a table (List)
 
 > [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/96815) in GitLab 15.4.
@@ -504,3 +555,48 @@ class Model < ApplicationRecord
   self.sequence_name = 'model_id_seq'
 end
 ```
+
+If the partitioning constraint migration takes [more than 10 minutes](../migration_style_guide.md#how-long-a-migration-should-take) to finish,
+it can be made to run asynchronously to avoid running the post-migration during busy hours.
+
+Prepend the following migration `AsyncPrepareTableConstraintsForListPartitioning`
+and use `async: true` option. This change marks the partitioning constraint as `NOT VALID`
+and enqueues a scheduled job to validate the existing data in the table during the weekend.
+
+Then the second post-migration `PrepareTableConstraintsForListPartitioning` only
+marks the partitioning constraint as validated, because the existing data is already
+tested during the previous weekend.
+
+For example:
+
+```ruby
+class AsyncPrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1]
+  include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers
+
+  disable_ddl_transaction!
+
+  TABLE_NAME = :table_name
+  PARENT_TABLE_NAME = :p_table_name
+  FIRST_PARTITION = 100
+  PARTITION_COLUMN = :partition_id
+
+  def up
+    prepare_constraint_for_list_partitioning(
+      table_name: TABLE_NAME,
+      partitioning_column: PARTITION_COLUMN,
+      parent_table_name: PARENT_TABLE_NAME,
+      initial_partitioning_value: FIRST_PARTITION,
+      async: true
+    )
+  end
+
+  def down
+    revert_preparing_constraint_for_list_partitioning(
+      table_name: TABLE_NAME,
+      partitioning_column: PARTITION_COLUMN,
+      parent_table_name: PARENT_TABLE_NAME,
+      initial_partitioning_value: FIRST_PARTITION
+    )
+  end
+end
+```
diff --git a/doc/development/database/transaction_guidelines.md b/doc/development/database/transaction_guidelines.md
index 26bb6c2ce8f..1648cf58937 100644
--- a/doc/development/database/transaction_guidelines.md
+++ b/doc/development/database/transaction_guidelines.md
@@ -12,7 +12,7 @@ For further reference, check PostgreSQL documentation about [transactions](https
 
 ## Database decomposition and sharding
 
-The [Pods Group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/pods/) plans
+The [Pods group](https://about.gitlab.com/handbook/engineering/development/enablement/data_stores/tenant-scale/) plans
 to split the main GitLab database and move some of the database tables to other database servers.
 
 We start decomposing the `ci_*`-related database tables first. To maintain the current application
author	GitLab Bot <gitlab-bot@gitlab.com>	2023-04-20 11:43:17 +0000
committer	GitLab Bot <gitlab-bot@gitlab.com>	2023-04-20 11:43:17 +0000
commit	dfc94207fec2d84314b1a5410cface22e8b369bd (patch)
tree	c54022f61ced104305889a64de080998a0dc773b /doc/development/database
parent	b874efeff674f6bf0355d5d242ecf81c6f7155df (diff)
download	gitlab-ce-15.11.0-rc42.tar.gz