summaryrefslogtreecommitdiff
path: root/doc/development/database_review.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database_review.md')
-rw-r--r--doc/development/database_review.md67
1 files changed, 51 insertions, 16 deletions
diff --git a/doc/development/database_review.md b/doc/development/database_review.md
index d1ec32af464..f0c265df9ab 100644
--- a/doc/development/database_review.md
+++ b/doc/development/database_review.md
@@ -1,7 +1,7 @@
---
stage: Enablement
group: Database
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Database Review Guidelines
@@ -36,9 +36,22 @@ point out specific queries for review and there are no obviously
complex queries, it is enough to concentrate on reviewing the
migration only.
-It is preferable to review queries in SQL form and generally accepted
-to ask the author to translate any ActiveRecord queries in SQL form
-for review.
+### Required
+
+The following artifacts are required prior to submitting for a ~database review.
+If your merge request description does not include these items, the review will be reassigned back to the author.
+
+If new migrations are introduced, in the MR **you are required to provide**:
+
+- The output of both migrating and rolling back for all migrations
+
+If new queries have been introduced or existing queries have been updated, **you are required to provide**:
+
+- [Query plans](#query-plans) for each raw SQL query included in the merge request along with the link to the query plan following each raw SQL snippet.
+- [Raw SQL](#raw-sql) for all changed or added queries (as translated from ActiveRecord queries).
+ - In case of updating an existing query, the raw SQL of both the old and the new version of the query should be provided together with their query plans.
+
+Refer to [Preparation when adding or modifying queries](#preparation-when-adding-or-modifying-queries) for how to provide this information.
### Roles and process
@@ -47,9 +60,11 @@ A Merge Request **author**'s role is to:
- Decide whether a database review is needed.
- If database review is needed, add the ~database label.
- [Prepare the merge request for a database review](#how-to-prepare-the-merge-request-for-a-database-review).
+- Provide the [required](#required) artifacts prior to submitting the MR.
A database **reviewer**'s role is to:
+- Ensure the [required](#required) artifacts are provided and in the proper format. If they are not, reassign the merge request back to the author.
- Perform a first-pass review on the MR and suggest improvements to the author.
- Once satisfied, relabel the MR with ~"database::reviewed", approve it, and
reassign MR to the database **maintainer** suggested by Reviewer
@@ -104,23 +119,43 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
#### Preparation when adding or modifying queries
+##### Raw SQL
+
- Write the raw SQL in the MR description. Preferably formatted
nicely with [pgFormatter](https://sqlformat.darold.net) or
[paste.depesz.com](https://paste.depesz.com) and using regular quotes
(e.g. `"projects"."id"`) and avoiding smart quotes (e.g. `“projects”.“id”`).
-- Include the output of `EXPLAIN (ANALYZE, BUFFERS)` of the relevant
- queries in the description. If the output is too long, wrap it in
- `<details>` blocks, paste it in a GitLab Snippet, or provide the
- link to the plan at: [explain.depesz.com](https://explain.depesz.com).
+- In case of queries generated dynamically by using parameters, there should be one raw SQL query for each variation.
+
+ For example, a finder for issues that may take as a parameter an optional filter on projects,
+ should include both the version of the simple query over issues and the one that joins issues
+ and projects and applies the filter.
+
+ There are finders or other methods that can generate a very large amount of permutations.
+ There is no need to exhaustively add all the possible generated queries, just the one with
+ all the parameters included and one for each type of queries generated.
+
+ For example, if joins or a group by clause are optional, the versions without the group by clause
+ and with less joins should be also included, while keeping the appropriate filters for the remaining tables.
+
+- If a query is going to be always used with a limit and an offset, those should always be
+ included with the maximum allowed limit used and a non 0 offset.
+
+##### Query Plans
+
+- The query plan for each raw SQL query included in the merge request along with the link to the query plan following each raw SQL snippet.
+- Provide the link to the plan at: [explain.depesz.com](https://explain.depesz.com). Paste both the plan and the query used in the form.
- When providing query plans, make sure it hits enough data:
- You can use a GitLab production replica to test your queries on a large scale,
through the `#database-lab` Slack channel or through [chatops](understanding_explain_plans.md#chatops).
- Usually, the `gitlab-org` namespace (`namespace_id = 9970`) and the
`gitlab-org/gitlab-foss` (`project_id = 13083`) or the `gitlab-org/gitlab` (`project_id = 278964`)
projects provide enough data to serve as a good example.
-- For query changes, it is best to provide the SQL query along with a
- plan _before_ and _after_ the change. This helps to spot differences
- quickly.
+ - That means that no query plan should return 0 records or less records than the provided limit (if a limit is included). If a query is used in batching, a proper example batch with adequate included results should be identified and provided.
+ - If your queries belong to a new feature in GitLab.com and thus they don't return data in production, it's suggested to analyze the query and to provide the plan from a local environment.
+ - More information on how to find the number of actual returned records in [Understanding EXPLAIN plans](understanding_explain_plans.md)
+- For query changes, it is best to provide both the SQL queries along with the
+ plan _before_ and _after_ the change. This helps spot differences quickly.
- Include data that shows the performance improvement, preferably in
the form of a benchmark.
@@ -158,8 +193,8 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
- Maintainer: After the merge request is merged, notify Release Managers about it on `#f_upcoming_release` Slack channel.
- Check consistency with `db/structure.sql` and that migrations are [reversible](migration_style_guide.md#reversibility)
- Check that the relevant version files under `db/schema_migrations` were added or removed.
- - Check queries timing (If any): Queries executed in a migration
- need to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
+ - Check queries timing (If any): In a single transaction, cumulative query time executed in a migration
+ needs to fit comfortably within `15s` - preferably much less than that - on GitLab.com.
- For column removals, make sure the column has been [ignored in a previous release](what_requires_downtime.md#dropping-columns)
- Check [background migrations](background_migrations.md):
- Establish a time estimate for execution on GitLab.com. For historical purposes,
@@ -190,7 +225,7 @@ test its execution using `CREATE INDEX CONCURRENTLY` in the `#database-lab` Slac
- For given queries, review parameters regarding data distribution
- [Check query plans](understanding_explain_plans.md) and suggest improvements
to queries (changing the query, schema or adding indexes and similar)
- - General guideline is for queries to come in below 100ms execution time
+ - General guideline is for queries to come in below [100ms execution time](query_performance.md#timing-guidelines-for-queries)
- Avoid N+1 problems and minimalize the [query count](merge_request_performance_guidelines.md#query-counts).
### Timing guidelines for migrations
@@ -199,11 +234,11 @@ In general, migrations for a single deploy shouldn't take longer than
1 hour for GitLab.com. The following guidelines are not hard rules, they were
estimated to keep migration timing to a minimum.
-NOTE: **Note:**
+NOTE:
Keep in mind that all runtimes should be measured against GitLab.com.
| Migration Type | Execution Time Recommended | Notes |
|----|----|---|
| Regular migrations on `db/migrate` | `3 minutes` | A valid exception are index creation as this can take a long time. |
| Post migrations on `db/post_migrate` | `10 minutes` | |
-| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below `1 second` execution time with cold caches. |
+| Background migrations | --- | Since these are suitable for larger tables, it's not possible to set a precise timing guideline, however, any single query must stay below [`1 second` execution time](query_performance.md#timing-guidelines-for-queries) with cold caches. |