summaryrefslogtreecommitdiff
path: root/doc/development/polymorphic_associations.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/polymorphic_associations.md')
-rw-r--r--doc/development/polymorphic_associations.md155
1 files changed, 7 insertions, 148 deletions
diff --git a/doc/development/polymorphic_associations.md b/doc/development/polymorphic_associations.md
index bbeaab40a90..6b9158b8408 100644
--- a/doc/development/polymorphic_associations.md
+++ b/doc/development/polymorphic_associations.md
@@ -1,152 +1,11 @@
---
-stage: none
-group: unassigned
-info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+redirect_to: 'database/polymorphic_associations.md'
+remove_date: '2022-11-04'
---
-# Polymorphic Associations
+This document was moved to [another location](database/polymorphic_associations.md).
-**Summary:** always use separate tables instead of polymorphic associations.
-
-Rails makes it possible to define so called "polymorphic associations". This
-usually works by adding two columns to a table: a target type column, and a
-target ID. For example, at the time of writing we have such a setup for
-`members` with the following columns:
-
-- `source_type`: a string defining the model to use, can be either `Project` or
- `Namespace`.
-- `source_id`: the ID of the row to retrieve based on `source_type`. For
- example, when `source_type` is `Project` then `source_id` contains a
- project ID.
-
-While such a setup may appear to be useful, it comes with many drawbacks; enough
-that you should avoid this at all costs.
-
-## Space Wasted
-
-Because this setup relies on string values to determine the model to use, it
-wastes a lot of space. For example, for `Project` and `Namespace` the
-maximum size is 9 bytes, plus 1 extra byte for every string when using
-PostgreSQL. While this may only be 10 bytes per row, given enough tables and
-rows using such a setup we can end up wasting quite a bit of disk space and
-memory (for any indexes).
-
-## Indexes
-
-Because our associations are broken up into two columns this may result in
-requiring composite indexes for queries to be performed efficiently. While
-composite indexes are not wrong at all, they can be tricky to set up as the
-ordering of columns in these indexes is important to ensure optimal performance.
-
-## Consistency
-
-One really big problem with polymorphic associations is being unable to enforce
-data consistency on the database level using foreign keys. For consistency to be
-enforced on the database level one would have to write their own foreign key
-logic to support polymorphic associations.
-
-Enforcing consistency on the database level is absolutely crucial for
-maintaining a healthy environment, and thus is another reason to avoid
-polymorphic associations.
-
-## Query Overhead
-
-When using polymorphic associations you always need to filter using both
-columns. For example, you may end up writing a query like this:
-
-```sql
-SELECT *
-FROM members
-WHERE source_type = 'Project'
-AND source_id = 13083;
-```
-
-Here PostgreSQL can perform the query quite efficiently if both columns are
-indexed. As the query gets more complex, it may not be able to use these
-indexes effectively.
-
-## Mixed Responsibilities
-
-Similar to functions and classes, a table should have a single responsibility:
-storing data with a certain set of pre-defined columns. When using polymorphic
-associations, you are storing different types of data (possibly with
-different columns set) in the same table.
-
-## The Solution
-
-Fortunately, there is a solution to these problems: use a
-separate table for every type you would otherwise store in the same table. Using
-a separate table allows you to use everything a database may provide to ensure
-consistency and query data efficiently, without any additional application logic
-being necessary.
-
-Let's say you have a `members` table storing both approved and pending members,
-for both projects and groups, and the pending state is determined by the column
-`requested_at` being set or not. Schema wise such a setup can lead to various
-columns only being set for certain rows, wasting space. It's also possible that
-certain indexes are only set for certain rows, again wasting space. Finally,
-querying such a table requires less than ideal queries. For example:
-
-```sql
-SELECT *
-FROM members
-WHERE requested_at IS NULL
-AND source_type = 'GroupMember'
-AND source_id = 4
-```
-
-Instead such a table should be broken up into separate tables. For example, you
-may end up with 4 tables in this case:
-
-- project_members
-- group_members
-- pending_project_members
-- pending_group_members
-
-This makes querying data trivial. For example, to get the members of a group
-you'd run:
-
-```sql
-SELECT *
-FROM group_members
-WHERE group_id = 4
-```
-
-To get all the pending members of a group in turn you'd run:
-
-```sql
-SELECT *
-FROM pending_group_members
-WHERE group_id = 4
-```
-
-If you want to get both you can use a `UNION`, though you need to be explicit
-about what columns you want to `SELECT` as otherwise the result set uses the
-columns of the first query. For example:
-
-```sql
-SELECT id, 'Group' AS target_type, group_id AS target_id
-FROM group_members
-
-UNION ALL
-
-SELECT id, 'Project' AS target_type, project_id AS target_id
-FROM project_members
-```
-
-The above example is perhaps a bit silly, but it shows that there's nothing
-stopping you from merging the data together and presenting it on the same page.
-Selecting columns explicitly can also speed up queries as the database has to do
-less work to get the data (compared to selecting all columns, even ones you're
-not using).
-
-Our schema also becomes easier. No longer do we need to both store and index the
-`source_type` column, we can define foreign keys easily, and we don't need to
-filter rows using the `IS NULL` condition.
-
-To summarize: using separate tables allows us to use foreign keys effectively,
-create indexes only where necessary, conserve space, query data more
-efficiently, and scale these tables more easily (for example, by storing them on
-separate disks). A nice side effect of this is that code can also become easier,
-as a single model isn't responsible for handling different kinds of
-data.
+<!-- This redirect file can be deleted after <2022-11-04>. -->
+<!-- Redirects that point to other docs in the same project expire in three months. -->
+<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. -->
+<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html -->