summaryrefslogtreecommitdiff
path: root/doc/development/filtering_by_label.md
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-09-30 09:12:38 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2021-09-30 09:12:38 +0000
commit5b80d465ae36e5f73ac974b20928aeac82634e20 (patch)
tree72cb34e0c37b0d2c47581926d8e0a90ff6278224 /doc/development/filtering_by_label.md
parentb920d2a9831056cdf907cf71fd25d94f0aaf1e6c (diff)
downloadgitlab-ce-5b80d465ae36e5f73ac974b20928aeac82634e20.tar.gz
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/development/filtering_by_label.md')
-rw-r--r--doc/development/filtering_by_label.md22
1 files changed, 17 insertions, 5 deletions
diff --git a/doc/development/filtering_by_label.md b/doc/development/filtering_by_label.md
index 2b9c7efc087..6f9811f7e05 100644
--- a/doc/development/filtering_by_label.md
+++ b/doc/development/filtering_by_label.md
@@ -82,6 +82,19 @@ AND (EXISTS (
While this worked without schema changes, and did improve readability somewhat,
it did not improve query performance.
+### Attempt A2: use label IDs in the WHERE EXISTS clause
+
+In [merge request #34503](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/34503), we followed a similar approach to A1. But this time, we
+did a separate query to fetch the IDs of the labels used in the filter so that we avoid the `JOIN` in the `EXISTS` clause and filter directly by
+`label_links.label_id`. We also added a new index on `label_links` for the `target_id`, `label_id`, and `target_type` columns to speed up this query.
+
+Finding the label IDs wasn't straightforward because there could be multiple labels with the same title within a single root namespace. We solved
+this by grouping the label IDs by title and then using the array of IDs in the `EXISTS` clauses.
+
+This resulted in a significant performance improvement. However, this optimization could not be applied to the dashboard pages
+where we do not have a project or group context. We could not easily search for the label IDs here because that would mean searching across all
+projects and groups that the user has access to.
+
## Attempt B: Denormalize using an array column
Having [removed MySQL support in GitLab 12.1](https://about.gitlab.com/blog/2019/06/27/removing-mysql-support/),
@@ -159,9 +172,8 @@ However, at present, the disadvantages outweigh the advantages.
## Conclusion
-We have yet to find a method that is demonstrably better than the current
-method, when considering:
+We found a method A2 that does not need denormalization and improves the query performance significantly. This
+did not apply to all cases, but we were able to apply method A1 to the rest of the cases so that we remove the
+`GROUP BY` and `HAVING` clauses in all scenarios.
-1. Query performance.
-1. Readability.
-1. Ease of maintaining schema consistency.
+This simplified the query and improved the performance in the most common cases.