summaryrefslogtreecommitdiff
path: root/doc/development/database
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database')
-rw-r--r--doc/development/database/efficient_in_operator_queries.md81
-rw-r--r--doc/development/database/loose_foreign_keys.md33
2 files changed, 106 insertions, 8 deletions
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md
index 1e706890f64..76518a6f5e2 100644
--- a/doc/development/database/efficient_in_operator_queries.md
+++ b/doc/development/database/efficient_in_operator_queries.md
@@ -589,6 +589,87 @@ LIMIT 20
NOTE:
To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
+#### Using calculated ORDER BY expression
+
+The following example orders epic records by the duration between the creation time and closed
+time. It is calculated with the following formula:
+
+```sql
+SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at) FROM epics
+```
+
+The query above returns the duration in seconds (`double precision`) between the two timestamp
+columns in seconds. To order the records by this expression, you must reference it
+in the `ORDER BY` clause:
+
+```sql
+SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at)
+FROM epics
+ORDER BY EXTRACT('epoch' FROM epics.closed_at - epics.created_at) DESC
+```
+
+To make this ordering efficient on the group-level with the in-operator optimization, use a
+custom `ORDER BY` configuration. Since the duration is not a distinct value (no unique index
+present), you must add a tie-breaker column (`id`).
+
+The following example shows the final `ORDER BY` clause:
+
+```sql
+ORDER BY extract('epoch' FROM epics.closed_at - epics.created_at) DESC, epics.id DESC
+```
+
+Snippet for loading records ordered by the calcualted duration:
+
+```ruby
+arel_table = Epic.arel_table
+order = Gitlab::Pagination::Keyset::Order.build([
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'duration_in_seconds',
+ order_expression: Arel.sql('EXTRACT(EPOCH FROM epics.closed_at - epics.created_at)').desc,
+ distinct: false,
+ sql_type: 'double precision' # important for calculated SQL expressions
+ ),
+ Gitlab::Pagination::Keyset::ColumnOrderDefinition.new(
+ attribute_name: 'id',
+ order_expression: arel_table[:id].desc
+ )
+])
+
+records = Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
+ scope: Epic.where.not(closed_at: nil).reorder(order), # filter out NULL values
+ array_scope: Group.find(9970).self_and_descendants.select(:id),
+ array_mapping_scope: -> (id_expression) { Epic.where(Epic.arel_table[:group_id].eq(id_expression)) }
+).execute.limit(20)
+
+puts records.pluck(:duration_in_seconds, :id) # other columnns are not available
+```
+
+Building the query requires quite a bit of configuration. For the order configuration you
+can find more information within the
+[complex order configuration](keyset_pagination.md#complex-order-configuration)
+section for keyset paginated database queries.
+
+The query requires a specialized database index:
+
+```sql
+CREATE INDEX index_epics_on_duration ON epics USING btree (group_id, EXTRACT(EPOCH FROM epics.closed_at - epics.created_at) DESC, id DESC) WHERE (closed_at IS NOT NULL);
+```
+
+Notice that the `finder_query` parameter is not used. The query only returns the `ORDER BY` columns
+which are the `duration_in_seconds` (calculated column) and the `id` columns. This is a limitation
+of the feature, defining the `finder_query` with calculated `ORDER BY` expressions is not supported.
+To get the complete database records, an extra query can be invoked by the returned `id` column:
+
+```ruby
+records_by_id = records.index_by(&:id)
+complete_records = Epic.where(id: records_by_id.keys).index_by(&:id)
+
+# Printing the complete records according to the `ORDER BY` clause
+records_by_id.each do |id, _|
+ puts complete_records[id].attributes
+end
+```
+
#### Batch iteration
Batch iteration over the records is possible via the keyset `Iterator` class.
diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md
index c60989f225d..97d150b1a18 100644
--- a/doc/development/database/loose_foreign_keys.md
+++ b/doc/development/database/loose_foreign_keys.md
@@ -43,7 +43,7 @@ we can:
1. Create a `DELETE` trigger on the `projects` table.
Record the deletions in a separate table (`deleted_records`).
-1. A job checks the `deleted_records` table every 5 minutes.
+1. A job checks the `deleted_records` table every minute or two.
1. For each record in the table, delete the associated `ci_pipelines` records
using the `project_id` column.
@@ -89,9 +89,10 @@ ci_pipelines:
### Track record changes
-To know about deletions in the `projects` table, configure a `DELETE` trigger using a database
-migration (post-migration). The trigger needs to be configured only once. If the model already has
-at least one `loose_foreign_key` definition, then this step can be skipped:
+To know about deletions in the `projects` table, configure a `DELETE` trigger
+using a [post-deployment migration](../post_deployment_migrations.md). The
+trigger needs to be configured only once. If the model already has at least one
+`loose_foreign_key` definition, then this step can be skipped:
```ruby
class TrackProjectRecordChanges < Gitlab::Database::Migration[1.0]
@@ -122,15 +123,20 @@ REFERENCES projects(id)
ON DELETE CASCADE;
```
-The migration should run after the `DELETE` trigger is installed. If the foreign key is deleted
-earlier, there is a good chance of introducing data inconsistency which needs manual cleanup:
+The migration must run after the `DELETE` trigger is installed and the loose
+foreign key definition is deployed. As such, it must be a [post-deployment
+migration](../post_deployment_migrations.md) dated after the migration for the
+trigger. If the foreign key is deleted earlier, there is a good chance of
+introducing data inconsistency which needs manual cleanup:
```ruby
class RemoveProjectsCiPipelineFk < Gitlab::Database::Migration[1.0]
- enable_lock_retries!
+ disable_ddl_transaction!
def up
- remove_foreign_key_if_exists(:ci_pipelines, :projects, name: "fk_86635dbd80")
+ with_lock_retries do
+ remove_foreign_key_if_exists(:ci_pipelines, :projects, name: "fk_86635dbd80")
+ end
end
def down
@@ -155,6 +161,17 @@ it_behaves_like 'it has loose foreign keys' do
end
```
+**After** [removing a foreign key](#remove-the-foreign-key),
+use the "`cleanup by a loose foreign key`" shared example to test a child record's deletion or nullification
+via the added loose foreign key:
+
+```ruby
+it_behaves_like 'cleanup by a loose foreign key' do
+ let!(:model) { create(:ci_pipeline, user: create(:user)) }
+ let!(:parent) { model.user }
+end
+```
+
## Caveats of loose foreign keys
### Record creation