diff options
Diffstat (limited to 'doc/development/database')
-rw-r--r-- | doc/development/database/efficient_in_operator_queries.md | 81 | ||||
-rw-r--r-- | doc/development/database/loose_foreign_keys.md | 33 |
2 files changed, 106 insertions, 8 deletions
diff --git a/doc/development/database/efficient_in_operator_queries.md b/doc/development/database/efficient_in_operator_queries.md index 1e706890f64..76518a6f5e2 100644 --- a/doc/development/database/efficient_in_operator_queries.md +++ b/doc/development/database/efficient_in_operator_queries.md @@ -589,6 +589,87 @@ LIMIT 20 NOTE: To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`. +#### Using calculated ORDER BY expression + +The following example orders epic records by the duration between the creation time and closed +time. It is calculated with the following formula: + +```sql +SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at) FROM epics +``` + +The query above returns the duration in seconds (`double precision`) between the two timestamp +columns in seconds. To order the records by this expression, you must reference it +in the `ORDER BY` clause: + +```sql +SELECT EXTRACT('epoch' FROM epics.closed_at - epics.created_at) +FROM epics +ORDER BY EXTRACT('epoch' FROM epics.closed_at - epics.created_at) DESC +``` + +To make this ordering efficient on the group-level with the in-operator optimization, use a +custom `ORDER BY` configuration. Since the duration is not a distinct value (no unique index +present), you must add a tie-breaker column (`id`). + +The following example shows the final `ORDER BY` clause: + +```sql +ORDER BY extract('epoch' FROM epics.closed_at - epics.created_at) DESC, epics.id DESC +``` + +Snippet for loading records ordered by the calcualted duration: + +```ruby +arel_table = Epic.arel_table +order = Gitlab::Pagination::Keyset::Order.build([ + Gitlab::Pagination::Keyset::ColumnOrderDefinition.new( + attribute_name: 'duration_in_seconds', + order_expression: Arel.sql('EXTRACT(EPOCH FROM epics.closed_at - epics.created_at)').desc, + distinct: false, + sql_type: 'double precision' # important for calculated SQL expressions + ), + Gitlab::Pagination::Keyset::ColumnOrderDefinition.new( + attribute_name: 'id', + order_expression: arel_table[:id].desc + ) +]) + +records = Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new( + scope: Epic.where.not(closed_at: nil).reorder(order), # filter out NULL values + array_scope: Group.find(9970).self_and_descendants.select(:id), + array_mapping_scope: -> (id_expression) { Epic.where(Epic.arel_table[:group_id].eq(id_expression)) } +).execute.limit(20) + +puts records.pluck(:duration_in_seconds, :id) # other columnns are not available +``` + +Building the query requires quite a bit of configuration. For the order configuration you +can find more information within the +[complex order configuration](keyset_pagination.md#complex-order-configuration) +section for keyset paginated database queries. + +The query requires a specialized database index: + +```sql +CREATE INDEX index_epics_on_duration ON epics USING btree (group_id, EXTRACT(EPOCH FROM epics.closed_at - epics.created_at) DESC, id DESC) WHERE (closed_at IS NOT NULL); +``` + +Notice that the `finder_query` parameter is not used. The query only returns the `ORDER BY` columns +which are the `duration_in_seconds` (calculated column) and the `id` columns. This is a limitation +of the feature, defining the `finder_query` with calculated `ORDER BY` expressions is not supported. +To get the complete database records, an extra query can be invoked by the returned `id` column: + +```ruby +records_by_id = records.index_by(&:id) +complete_records = Epic.where(id: records_by_id.keys).index_by(&:id) + +# Printing the complete records according to the `ORDER BY` clause +records_by_id.each do |id, _| + puts complete_records[id].attributes +end +``` + #### Batch iteration Batch iteration over the records is possible via the keyset `Iterator` class. diff --git a/doc/development/database/loose_foreign_keys.md b/doc/development/database/loose_foreign_keys.md index c60989f225d..97d150b1a18 100644 --- a/doc/development/database/loose_foreign_keys.md +++ b/doc/development/database/loose_foreign_keys.md @@ -43,7 +43,7 @@ we can: 1. Create a `DELETE` trigger on the `projects` table. Record the deletions in a separate table (`deleted_records`). -1. A job checks the `deleted_records` table every 5 minutes. +1. A job checks the `deleted_records` table every minute or two. 1. For each record in the table, delete the associated `ci_pipelines` records using the `project_id` column. @@ -89,9 +89,10 @@ ci_pipelines: ### Track record changes -To know about deletions in the `projects` table, configure a `DELETE` trigger using a database -migration (post-migration). The trigger needs to be configured only once. If the model already has -at least one `loose_foreign_key` definition, then this step can be skipped: +To know about deletions in the `projects` table, configure a `DELETE` trigger +using a [post-deployment migration](../post_deployment_migrations.md). The +trigger needs to be configured only once. If the model already has at least one +`loose_foreign_key` definition, then this step can be skipped: ```ruby class TrackProjectRecordChanges < Gitlab::Database::Migration[1.0] @@ -122,15 +123,20 @@ REFERENCES projects(id) ON DELETE CASCADE; ``` -The migration should run after the `DELETE` trigger is installed. If the foreign key is deleted -earlier, there is a good chance of introducing data inconsistency which needs manual cleanup: +The migration must run after the `DELETE` trigger is installed and the loose +foreign key definition is deployed. As such, it must be a [post-deployment +migration](../post_deployment_migrations.md) dated after the migration for the +trigger. If the foreign key is deleted earlier, there is a good chance of +introducing data inconsistency which needs manual cleanup: ```ruby class RemoveProjectsCiPipelineFk < Gitlab::Database::Migration[1.0] - enable_lock_retries! + disable_ddl_transaction! def up - remove_foreign_key_if_exists(:ci_pipelines, :projects, name: "fk_86635dbd80") + with_lock_retries do + remove_foreign_key_if_exists(:ci_pipelines, :projects, name: "fk_86635dbd80") + end end def down @@ -155,6 +161,17 @@ it_behaves_like 'it has loose foreign keys' do end ``` +**After** [removing a foreign key](#remove-the-foreign-key), +use the "`cleanup by a loose foreign key`" shared example to test a child record's deletion or nullification +via the added loose foreign key: + +```ruby +it_behaves_like 'cleanup by a loose foreign key' do + let!(:model) { create(:ci_pipeline, user: create(:user)) } + let!(:parent) { model.user } +end +``` + ## Caveats of loose foreign keys ### Record creation |