diff options
Diffstat (limited to 'doc/development/database_review.md')
-rw-r--r-- | doc/development/database_review.md | 67 |
1 files changed, 51 insertions, 16 deletions
diff --git a/doc/development/database_review.md b/doc/development/database_review.md index d1ec32af464..f0c265df9ab 100644 --- a/doc/development/database_review.md +++ b/doc/development/database_review.md @@ -1,7 +1,7 @@ --- stage: Enablement group: Database -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments --- # Database Review Guidelines @@ -36,9 +36,22 @@ point out specific queries for review and there are no obviously complex queries, it is enough to concentrate on reviewing the migration only. -It is preferable to review queries in SQL form and generally accepted -to ask the author to translate any ActiveRecord queries in SQL form -for review. +### Required + +The following artifacts are required prior to submitting for a ~database review. +If your merge request description does not include these items, the review will be reassigned back to the author. + +If new migrations are introduced, in the MR **you are required to provide**: + +- The output of both migrating and rolling back for all migrations + +If new queries have been introduced or existing queries have been updated, **you are required to provide**: + +- [Query plans](#query-plans) for each raw SQL query included in the merge request along with the link to the query plan following each raw SQL snippet. +- [Raw SQL](#raw-sql) for all changed or added queries (as translated from ActiveRecord queries). + - In case of updating an existing query, the raw SQL of both the old and the new version of the query should be provided together with their query plans. + +Refer to [Preparation when adding or modifying queries](#preparation-when-adding-or-modifying-queries) for how to provide this information. ### Roles and process @@ -47,9 +60,11 @@ A Merge Request **author**'s role is to: - Decide whether a database review is needed. - If database review is needed, add the ~database label. - [Prepare the merge request for a database review](#how-to-prepare-the-merge-request-for-a-database-review). +- Provide the [required](#required) artifacts prior to submitting the MR. A database **reviewer**'s role is to: +- Ensure the [required](#required) artifacts are provided and in the proper format. If they are not, reassign the merge request back to the author. - Perform a first-pass review on the MR and suggest improvements to the author. - Once satisfied, relabel the MR with ~"database::reviewed", approve it, and reassign MR to the database **maintainer** suggested by Reviewer @@ -104,23 +119,43 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac #### Preparation when adding or modifying queries +##### Raw SQL + - Write the raw SQL in the MR description. Preferably formatted nicely with [pgFormatter](https://sqlformat.darold.net) or [paste.depesz.com](https://paste.depesz.com) and using regular quotes (e.g. `"projects"."id"`) and avoiding smart quotes (e.g. `“projects”.“id”`). -- Include the output of `EXPLAIN (ANALYZE, BUFFERS)` of the relevant - queries in the description. If the output is too long, wrap it in - `<details>` blocks, paste it in a GitLab Snippet, or provide the - link to the plan at: [explain.depesz.com](https://explain.depesz.com). +- In case of queries generated dynamically by using parameters, there should be one raw SQL query for each variation. + + For example, a finder for issues that may take as a parameter an optional filter on projects, + should include both the version of the simple query over issues and the one that joins issues + and projects and applies the filter. + + There are finders or other methods that can generate a very large amount of permutations. + There is no need to exhaustively add all the possible generated queries, just the one with + all the parameters included and one for each type of queries generated. + + For example, if joins or a group by clause are optional, the versions without the group by clause + and with less joins should be also included, while keeping the appropriate filters for the remaining tables. + +- If a query is going to be always used with a limit and an offset, those should always be + included with the maximum allowed limit used and a non 0 offset. + +##### Query Plans + +- The query plan for each raw SQL query included in the merge request along with the link to the query plan following each raw SQL snippet. +- Provide the link to the plan at: [explain.depesz.com](https://explain.depesz.com). Paste both the plan and the query used in the form. - When providing query plans, make sure it hits enough data: - You can use a GitLab production replica to test your queries on a large scale, through the `#database-lab` Slack channel or through [chatops](understanding_explain_plans.md#chatops). - Usually, the `gitlab-org` namespace (`namespace_id = 9970`) and the `gitlab-org/gitlab-foss` (`project_id = 13083`) or the `gitlab-org/gitlab` (`project_id = 278964`) projects provide enough data to serve as a good example. -- For query changes, it is best to provide the SQL query along with a - plan _before_ and _after_ the change. This helps to spot differences - quickly. + - That means that no query plan should return 0 records or less records than the provided limit (if a limit is included). If a query is used in batching, a proper example batch with adequate included results should be identified and provided. + - If your queries belong to a new feature in GitLab.com and thus they don't return data in production, it's suggested to analyze the query and to provide the plan from a local environment. + - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](understanding_explain_plans.md) +- For query changes, it is best to provide both the SQL queries along with the + plan _before_ and _after_ the change. This helps spot differences quickly. - Include data that shows the performance improvement, preferably in the form of a benchmark. @@ -158,8 +193,8 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac - Maintainer: After the merge request is merged, notify Release Managers about it on `#f_upcoming_release` Slack channel. - Check consistency with `db/structure.sql` and that migrations are [reversible](migration_style_guide.md#reversibility) - Check that the relevant version files under `db/schema_migrations` were added or removed. - - Check queries timing (If any): Queries executed in a migration - need to fit comfortably within `15s` - preferably much less than that - on GitLab.com. + - Check queries timing (If any): In a single transaction, cumulative query time executed in a migration + needs to fit comfortably within `15s` - preferably much less than that - on GitLab.com. - For column removals, make sure the column has been [ignored in a previous release](what_requires_downtime.md#dropping-columns) - Check [background migrations](background_migrations.md): - Establish a time estimate for execution on GitLab.com. For historical purposes, @@ -190,7 +225,7 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac - For given queries, review parameters regarding data distribution - [Check query plans](understanding_explain_plans.md) and suggest improvements to queries (changing the query, schema or adding indexes and similar) - - General guideline is for queries to come in below 100ms execution time + - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries) - Avoid N+1 problems and minimalize the [query count](merge_request_performance_guidelines.md#query-counts). ### Timing guidelines for migrations @@ -199,11 +234,11 @@ In general, migrations for a single deploy shouldn't take longer than 1 hour for GitLab.com. The following guidelines are not hard rules, they were estimated to keep migration timing to a minimum. -NOTE: **Note:** +NOTE: Keep in mind that all runtimes should be measured against GitLab.com. | Migration Type | Execution Time Recommended | Notes | |----|----|---| | Regular migrations on `db/migrate` | `3 minutes` | A valid exception are index creation as this can take a long time. | | Post migrations on `db/post_migrate` | `10 minutes` | | -| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below `1 second` execution time with cold caches. | +| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. | |