summaryrefslogtreecommitdiff
path: root/doc/architecture/blueprints/database
diff options
context:
space:
mode:
authorGitLab Bot <gitlab-bot@gitlab.com>2021-08-03 21:09:39 +0000
committerGitLab Bot <gitlab-bot@gitlab.com>2021-08-03 21:09:39 +0000
commit4bf395cded929b1f2d2419079d8107604c9f930f (patch)
treed6edb3ab04e1a8241f9ac44ebb789cfc6ebaeff9 /doc/architecture/blueprints/database
parent49058851264455c22a5ba00c8671b7d4cdfd8ee9 (diff)
downloadgitlab-ce-4bf395cded929b1f2d2419079d8107604c9f930f.tar.gz
Add latest changes from gitlab-org/gitlab@master
Diffstat (limited to 'doc/architecture/blueprints/database')
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/db_terminology_v14_2.pngbin0 -> 51264 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_calls_v14_2.pngbin0 -> 157824 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_fixed_v14_2.pngbin0 -> 85790 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_readwriteratio_v14_2.pngbin0 -> 93291 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_reads_v14_2.pngbin0 -> 60703 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_writes_v14_2.pngbin0 -> 52727 bytes
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/index.md12
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/read_mostly.md152
-rw-r--r--doc/architecture/blueprints/database/scalability/patterns/time_decay.md361
9 files changed, 525 insertions, 0 deletions
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/db_terminology_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/db_terminology_v14_2.png
new file mode 100644
index 00000000000..85ba1360f06
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/db_terminology_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_calls_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_calls_v14_2.png
new file mode 100644
index 00000000000..f6ae995391c
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_calls_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_fixed_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_fixed_v14_2.png
new file mode 100644
index 00000000000..dcfaae230d3
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_licenses_fixed_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_readwriteratio_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_readwriteratio_v14_2.png
new file mode 100644
index 00000000000..9b85f814bc1
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_readwriteratio_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_reads_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_reads_v14_2.png
new file mode 100644
index 00000000000..4f448841e48
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_reads_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_writes_v14_2.png b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_writes_v14_2.png
new file mode 100644
index 00000000000..e4f5756051f
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/img/read_mostly_subscriptions_writes_v14_2.png
Binary files differ
diff --git a/doc/architecture/blueprints/database/scalability/patterns/index.md b/doc/architecture/blueprints/database/scalability/patterns/index.md
new file mode 100644
index 00000000000..dadf3634407
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/index.md
@@ -0,0 +1,12 @@
+---
+stage: Enablement
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+comments: false
+description: 'Learn how to scale the database through the use of best-of-class database scalability patterns'
+---
+
+# Database Scalability Patterns
+
+- [Read-mostly](read_mostly.md)
+- [Time-decay](time_decay.md)
diff --git a/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
new file mode 100644
index 00000000000..bd87573a88e
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/read_mostly.md
@@ -0,0 +1,152 @@
+---
+stage: Enablement
+group: database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+comments: false
+description: 'Learn how to scale operating on read-mostly data at scale'
+---
+
+# Read-mostly data
+
+[Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/326037) in GitLab 14.0.
+
+This document describes the *read-mostly* pattern introduced in the
+[Database Scalability Working Group](https://about.gitlab.com/company/team/structure/working-groups/database-scalability/#read-mostly-data).
+We discuss the characteristics of *read-mostly* data and propose best practices for GitLab development
+to consider in this context.
+
+## Characteristics of read-mostly data
+
+As the name already suggests, *read-mostly* data is about data that is much more often read than
+updated. Writing this data through updates, inserts, or deletes is a very rare event compared to
+reading this data.
+
+In addition, *read-mostly* data in this context is typically a small dataset. We explicitly don't deal
+with large datasets here, even though they often have a "write once, read often" characteristic, too.
+
+### Example: license data
+
+Let's introduce a canonical example: license data in GitLab. A GitLab instance may have a license
+attached to use GitLab enterprise features. This license data is held instance-wide, that
+is, there typically only exist a few relevant records. This information is kept in a table
+`licenses` which is very small.
+
+We consider this *read-mostly* data, because it follows above outlined characteristics:
+
+- **Rare writes:** license data very rarely sees any writes after having inserted the license.
+- **Frequent reads:** license data is read extremely often to check if enterprise features can be used.
+- **Small size:** this dataset is very small. On GitLab.com we have 5 records at < 50 kB total relation size.
+
+### Effects of *read-mostly* data at scale
+
+Given this dataset is small and read very often, we can expect data to nearly always reside in
+database caches and/or database disk caches. Thus, the concern with *read-mostly* data is typically
+not around database I/O overhead, because we typically don't read data from disk anyway.
+
+However, considering the high frequency reads, this has potential to incur overhead in terms of
+database CPU load and database context switches. Additionally, those high frequency queries go
+through the whole database stack. They also cause overhead on the database connection
+multiplexing components and load balancers. Also, the application spends cycles in preparing and
+sending queries to retrieve the data, deserialize the results and allocate new objects to represent
+the information gathered - all in a high frequency fashion.
+
+In the example of license data above, the query to read license data was
+[identified](https://gitlab.com/gitlab-org/gitlab/-/issues/292900) to stand out in terms of query
+frequency. In fact, we were seeing around 6,000 queries per second (QPS) on the cluster during peak
+times. With the cluster size at that time, we were seeing about 1,000 QPS on each replica, and fewer
+than 400 QPS on the primary at peak times. The difference is explained by our
+[database load balancing for scaling reads](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/database/load_balancing.rb),
+which favors replicas for pure read-only transactions.
+
+![Licenses Calls](img/read_mostly_licenses_calls_v14_2.png)
+
+The overall transaction throughput on the database primary at the time varied between 50,000 and
+70,000 transactions per second (TPS). In comparison, this query frequency only takes a small
+portion of the overall query frequency. However, we do expect this to still have considerable
+overhead in terms of context switches. It is worth removing this overhead, if we can.
+
+## How to recognize read-mostly data
+
+It can be difficult to recognize *read-mostly* data, even though there are clear cases like in our
+example.
+
+One approach is to look at the [read/write ratio and statistics from, for example, the primary](https://bit.ly/3frdtyz). Here, we look at the TOP20 tables by their read/write ratio over 60 minutes (taken in a peak traffic time):
+
+```plaintext
+bottomk(20,
+avg by (relname, fqdn) (
+ (
+ rate(pg_stat_user_tables_seq_tup_read{env="gprd"}[1h])
+ +
+ rate(pg_stat_user_tables_idx_tup_fetch{env="gprd"}[1h])
+ ) /
+ (
+ rate(pg_stat_user_tables_seq_tup_read{env="gprd"}[1h])
+ + rate(pg_stat_user_tables_idx_tup_fetch{env="gprd"}[1h])
+ + rate(pg_stat_user_tables_n_tup_ins{env="gprd"}[1h])
+ + rate(pg_stat_user_tables_n_tup_upd{env="gprd"}[1h])
+ + rate(pg_stat_user_tables_n_tup_del{env="gprd"}[1h])
+ )
+) and on (fqdn) (pg_replication_is_replica == 0)
+)
+```
+
+This yields a good impression of which tables are much more often read than written (on the database
+primary):
+
+![Read Write Ratio TOP20](img/read_mostly_readwriteratio_v14_2.png)
+
+From here, we can [zoom](https://bit.ly/2VmloX1) into for example `gitlab_subscriptions` and realize that index reads peak at above 10k tuples per second overall (there are no seq scans):
+
+![Subscriptions: reads](img/read_mostly_subscriptions_reads_v14_2.png)
+
+We very rarely write to the table (there are no seq scans):
+
+![Subscriptions: writes](img/read_mostly_subscriptions_writes_v14_2.png)
+
+Additionally, the table is only 400 MB in size - so this may be another candidate we may want to
+consider in this pattern (see [#327483](https://gitlab.com/gitlab-org/gitlab/-/issues/327483)).
+
+## Best practices for handling read-mostly data at scale
+
+### Cache read-mostly data
+
+To reduce the database overhead, we implement a cache for the data and thus significantly
+reduce the query frequency on the database side. There are different scopes for caching available:
+
+- `RequestStore`: per-request in-memory cache (based on [request_store gem](https://github.com/steveklabnik/request_store))
+- [`ProcessMemoryCache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/process_memory_cache.rb#L4): per-process in-memory cache (a `ActiveSupport::Cache::MemoryStore`)
+- [`Gitlab::Redis::Cache`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/redis/cache.rb) and `Rails.cache`: full-blown cache in Redis
+
+Continuing the above example, we had a `RequestStore` in place to cache license information on a
+per-request basis. However, that still leads to one query per request. When we started to cache license information
+[using a process-wide in-memory cache](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/50318)
+for 1 second, query frequency dramatically dropped:
+
+![Licenses Calls - Fixed](img/read_mostly_licenses_fixed_v14_2.png)
+
+The choice of caching here highly depends on the characteristics of data in question. A very small
+dataset like license data that is nearly never updated is a good candidate for in-memory caching.
+A per-process cache is favorable here, because this unties the cache refresh rate from the incoming
+request rate.
+
+A caveat here is that our Redis setup is currently not using Redis secondaries and we rely on a
+single node for caching. That is, we need to strike a balance to avoid Redis falling over due to
+increased pressure. In comparison, reading data from PostgreSQL replicas can be distributed across
+several read-only replicas. Even though a query to the database might be more expensive, the
+load is balanced across more nodes.
+
+### Read read-mostly data from replica
+
+With or without caching implemented, we also must make sure to read data from database replicas if
+we can. This supports our efforts to scale reads across many database replicas and removes
+unnecessary workload from the database primary.
+
+GitLab [database load balancing for reads](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/gitlab/database/load_balancing.rb)
+sticks to the primary after a first write or when opening an
+explicit transaction. In the context of *read-mostly* data, we strive to read this data outside of a
+transaction scope and before doing any writes. This is often possible given that this data is only
+seldom updated (and thus we're often not concerned with reading slightly stale data, for example).
+However, it can be non-obvious that this query cannot be sent to a replica because of a previous
+write or transaction. Hence, when we encounter *read-mostly* data, it is a good practice to check the
+wider context and make sure this data can be read from a replica.
diff --git a/doc/architecture/blueprints/database/scalability/patterns/time_decay.md b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
new file mode 100644
index 00000000000..6e0187a8d74
--- /dev/null
+++ b/doc/architecture/blueprints/database/scalability/patterns/time_decay.md
@@ -0,0 +1,361 @@
+---
+stage: Enablement
+group: database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+comments: false
+description: 'Learn how to operate on large time-decay data'
+---
+
+# Time-decay data
+
+[Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/326035) in GitLab 14.0.
+
+This document describes the *time-decay pattern* introduced in the
+[Database Scalability Working Group](https://about.gitlab.com/company/team/structure/working-groups/database-scalability/#time-decay-data).
+We discuss the characteristics of time-decay data, and propose best practices for GitLab development
+to consider in this context.
+
+Some datasets are subject to strong time-decay effects, in which recent data is accessed far more
+frequently than older data. Another aspect of time-decay: with time, some types of data become
+less important. This means we can also move old data to a bit less durable (less available) storage,
+or even delete the data, in extreme cases.
+
+Those effects are usually tied to product or application semantics. They can vary in the degree
+that older data are accessed, and how useful or required older data are to the users or the
+application.
+
+Let's first consider entities with no inherent time-related bias for their data.
+
+A record for a user or a project may be equally important and frequently accessed, irrelevant to when
+it was created. We can not predict by using a user's `id` or `created_at` how often the related
+record is accessed or updated.
+
+On the other hand, a good example for datasets with extreme time-decay effects are logs and time
+series data, such as events recording user actions.
+
+Most of the time, that type of data may have no business use after a couple of days or weeks, and
+quickly become less important even from a data analysis perspective. They represent a snapshot that
+quickly becomes less and less relevant to the current state of the application, until at
+some point it has no real value.
+
+In the middle of the two extremes, we can find datasets that have useful information that we want to
+keep around, but with old records seldom being accessed after an initial (small) time period after
+creation.
+
+## Characteristics of time-decay data
+
+We are interested in datasets that show the following characteristics:
+
+- **Size of the dataset:** they are considerably large.
+- **Access methods:** we can filter the vast majority of queries accessing the dataset
+ by a time related dimension or a categorical dimension with time decay effects.
+- **Immutability:** the time-decay status does not change.
+- **Retention:** whether we want to keep the old data or not, or whether old
+ data should remain accessible by users through the application.
+
+### Size of the dataset
+
+There can be datasets of variable sizes that show strong time-decay effects, but in the context of
+this blueprint, we intend to focus on entities with a **considerably large dataset**.
+
+Smaller datasets do not contribute significantly to the database related resource usage, nor do they
+inflict a considerable performance penalty to queries.
+
+In contrast, large datasets over about 50 million records, or 100 GB in size, add a significant
+overhead to constantly accessing a really small subset of the data. In those cases, we would want to
+use the time-decay effect in our advantage and reduce the actively accessed dataset.
+
+### Data access methods
+
+The second and most important characteristic of time-decay data is that most of the time, we are
+able to implicitly or explicitly access the data using a date filter,
+**restricting our results based on a time-related dimension**.
+
+There can be many such dimensions, but we are only going to focus on the creation date as it is both
+the most commonly used, and the one that we can control and optimize against. It:
+
+- Is immutable.
+- Is set when the record is created
+- Can be tied to physically clustering the records, without having to move them around.
+
+It's important to add that even if time-decay data are not accessed that way by the application by
+default, you can make the vast majority of the queries explicitly filter the data in such
+a way. **Time decay data without such a time-decay related access method are of no use from an optimization perspective, as there is no way to set and follow a scaling pattern.**
+
+We are not restricting the definition to data that are always accessed using a time-decay related
+access method, as there may be some outlier operations. These may be necessary and we can accept
+them not scaling, if the rest of the access methods can scale. An example:
+an administrator accessing all past events of a specific type, while all other operations only access
+a maximum of a month of events, restricted to 6 months in the past.
+
+### Immutability
+
+The third characteristic of time-decay data is that their **time-decay status does not change**.
+Once they are considered "old", they can not switch back to "new" or relevant again.
+
+This definition may sound trivial, but we have to be able to make operations over "old" data **more**
+expensive (for example, by archiving or moving them to less expensive storage) without having to worry about
+the repercussions of switching back to being relevant and having important application operations
+underperforming.
+
+Consider as a counter example to a time-decay data access pattern an application view that presents
+issues by when they were updated. We are also interested in the most recent data from an "update"
+perspective, but that definition is volatile and not actionable.
+
+### Retention
+
+Finally, a characteristic that further differentiates time-decay data in sub-categories with
+slightly different approaches available is **whether we want to keep the old data or not**
+(for example, retention policy) and/or
+**whether old data will be accessible by users through the application**.
+
+#### (optional) Extended definition of time-decay data
+
+As a side note, if we extend the aforementioned definitions to access patterns that restrict access
+to a well defined subset of the data based on a clustering attribute, we could use the time-decay
+scaling patterns for many other types of data.
+
+As an example, consider data that are only accessed while they are labeled as active, like To-Dos
+not marked as done, pipelines for unmerged merge requests (or a similar not time based constraint), etc.
+In this case, instead of using a time dimension to define the decay, we use a categorical dimension
+(that is, one that uses a finite set of values) to define the subset of interest. As long as that
+subset is small compared to the overall size of the dataset, we could use the same approach.
+
+Similarly, we may define data as old based both on a time dimension and additional status attributes,
+such as CI pipelines that failed more than 6 months ago.
+
+## Time-decay data strategies
+
+### Partition tables
+
+This is the acceptable best practice for addressing time-decay data from a pure database perspective.
+You can find more information on table partitioning for PostgreSQL in the
+[documentation page for table partitioning](https://www.postgresql.org/docs/12/ddl-partitioning.html).
+
+Partitioning by date intervals (for example, month, year) allows us to create much smaller tables
+(partitions) for each date interval and only access the most recent partition(s) for any
+application related operation.
+
+We have to set the partitioning key based on the date interval of interest, which may depend on two
+factors:
+
+1. **How far back in time do we need to access data for?**
+ Partitioning by week is of no use if we always access data for a year back, as we would have to
+ execute queries over 52 different partitions (tables) each time. As an example for that consider the
+ activity feed on the profile of any GitLab user.
+
+ In contrast, if we want to just access the last 7 days of created records, partitioning by year
+ would include too many unnecessary records in each partition, as is the case for `web_hook_logs`.
+1. **How large are the partitions created?**
+ The major purpose of partitioning is accessing tables that are as small as possible. If they get too
+ large by themselves, queries will start underperforming. We may have to re-partition (split) them
+ in even smaller partitions.
+
+The perfect partitioning scheme keeps **all queries over a dataset almost always over a single partition**,
+with some cases going over two partitions and seldom over multiple partitions being
+an acceptable balance. We should also target for **partitions that are as small as possible**, below
+5-10M records and/or 10GB each maximum.
+
+Partitioning can be combined with other strategies to either prune (drop) old partitions, move them
+to cheaper storage inside the database or move them outside of the database (archive or use of other
+types of storage engines).
+
+As long as we do not want to keep old records and partitioning is used, pruning old data has a
+constant, for all intents and purposes zero, cost compared to deleting the data from a huge table
+(as described in the following sub-section). We just need a background worker to drop old partitions
+whenever all the data inside that partition get out of the retention policy's period.
+
+As an example, if we only want to keep records no more than 6 months old and we partition by month,
+we can safely keep the 7 latest partitions at all times (current month and 6 months in the past).
+That means that we can have a worker dropping the 8th oldest partition at the start of each month.
+
+Moving partitions to cheaper storage inside the same database is relatively simple in PostgreSQL
+through the use of [tablespaces](https://www.postgresql.org/docs/12/manage-ag-tablespaces.html).
+It is possible to specify a tablespace and storage parameters for each partition separately, so the
+approach in this case would be to:
+
+1. Create a new tablespace on a cheaper, slow disk.
+1. Set the storage parameters higher on that new tablespace so that the PostgreSQL optimizer knows that the disks are slower.
+1. Move the old partitions to the slow tablespace automatically by using background workers.
+
+Finally, moving partitions outside of the database can be achieved through database archiving or
+manually exporting the partitions to a different storage engine (more details in the dedicated
+sub-section).
+
+### Prune old data
+
+If we don't want to keep old data around in any form, we can implement a pruning strategy and
+delete old data.
+
+It's a simple-to-implement strategy that uses a pruning worker to delete past data. As an example
+that we further analyze below, we are pruning old `web_hook_logs` older than 90 days.
+
+The disadvantage of such a solution over large, non-partitioned tables is that we have to manually
+access and delete all the records that are considered as not relevant any more. That is a very
+expensive operation, due to multi-version concurrency control in PostgreSQL. It also leads to the
+pruning worker not being able to catch up with new records being created, if that rate exceeds a
+threshold, as is the case of [web_hook_logs](https://gitlab.com/gitlab-org/gitlab/-/issues/256088)
+at the time of writing this document.
+
+For the aforementioned reasons, our proposal is that
+**we should base any implementation of a data retention strategy on partitioning**,
+unless there are strong reasons not to.
+
+### Move old data outside of the database
+
+In most cases, we consider old data as valuable, so we do not want to prune them. If at the same
+time, they are not required for any database related operations (for example, directly accessed or used in
+joins and other types of queries), we can move them outside of the database.
+
+That does not mean that they are not directly accessible by users through the application; we could
+move data outside the database and use other storage engines or access types for them, similarly to
+offloading metadata but only for the case of old data.
+
+In the simplest use case we can provide fast and direct access to recent data, while allowing users
+to download an archive with older data. This is an option evaluated in the `audit_events` use case.
+Depending on the country and industry, audit events may have a very long retention period, while
+only the past month(s) of data are actively accessed through GitLab interface.
+
+Additional use cases may include exporting data to a data warehouse or other types of data stores as
+they may be better suited for processing that type of data. An example can be JSON logs that we
+sometimes store in tables: loading such data into a BigQuery or a columnar store like Redshift may
+be better for analyzing/querying the data.
+
+We might consider a number of strategies for moving data outside of the database:
+
+1. Streaming this type of data into logs and then move them to secondary storage options
+ or load them to other types of data stores directly (as CSV/JSON data).
+1. Creating an ETL process that exports the data to CSV, uploads them to object storage,
+ drops this data from the database, and then loads the CSV into a different data store.
+1. Loading the data in the background by using the API provided by the data store.
+
+This may be a not viable solution for large datasets; as long as bulk uploading using files is an
+option, it should outperform API calls.
+
+## Use cases
+
+### Web hook logs
+
+Related epic: [Partitioning: `web_hook_logs` table](https://gitlab.com/groups/gitlab-org/-/epics/5558)
+
+The important characteristics of `web_hook_logs` are the following:
+
+1. Size of the dataset: it is a really large table. At the moment we decided to
+ partition it (`2021-03-01`), it had roughly 527M records and a total size of roughly 1TB
+
+ - Table: `web_hook_logs`
+ - Rows: approximately 527M
+ - Total size: 1.02 TiB (10.46%)
+ - Table size: 713.02 GiB (13.37%)
+ - Index(es) size: 42.26 GiB (1.10%)
+ - TOAST size: 279.01 GiB (38.56%)
+
+1. Access methods: we always request for the past 7 days of logs at max.
+1. Immutability: it can be partitioned by `created_at`, an attribute that does not change.
+1. Retention: there is a 90 days retention policy set for it.
+
+Additionally, we were at the time trying to prune the data by using a background worker
+(`PruneWebHookLogsWorker`), which could not [keep up with the rate of inserts](https://gitlab.com/gitlab-org/gitlab/-/issues/256088).
+
+As a result, on March 2021 there were still not deleted records since July 2020 and the table was
+increasing in size by more than 2 million records per day instead of staying at a more or less
+stable size.
+
+Finally, the rate of inserts has grown to more than 170GB of data per month by March 2021 and keeps
+on growing, so the only viable solution to pruning old data was through partitioning.
+
+Our approach was to partition the table per month as it aligned with the 90 days retention policy.
+
+The process required follows:
+
+1. Decide on a partitioning key
+
+ Using the `created_at` column is straightforward in this case: it is a natural
+ partitioning key when a retention policy exists and there were no conflicting access patterns.
+
+1. After we decide on the partitioning key, we can create the partitions and backfill
+ them (copy data from the existing table). We can't just partition an existing table;
+ we have to create a new partitioned table.
+
+ So, we have to create the partitioned table and all the related partitions, start copying everything
+ over, and also add sync triggers so that any new data or updates/deletes to existing data can be
+ mirrored to the new partitioned table.
+
+ [MR with all the necessary details on how to start partitioning a table](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/55938)
+
+ It required 15 days and 7.6 hours to complete that process.
+
+1. One milestone after the initial partitioning starts, clean up after the background migration
+ used to backfill and finish executing any remaining jobs, retry failed jobs, etc.
+
+ [MR with all the necessary details](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/57580)
+
+1. Add any remaining foreign keys and secondary indexes to the partitioned table. This brings
+ its schema on par with the original non partitioned table before we can swap them in the next milestone.
+
+ We are not adding them at the beginning as they are adding overhead to each insert and they
+ would slow down the initial backfilling of the table (in this case for more than half a billion
+ records, which can add up significantly). So we create a lightweight, *vanilla* version of the
+ table, copy all the data and then add any remaining indexes and foreign keys.
+
+1. Swap the base table with partitioned copy: this is when the partitioned table
+ starts actively being used by the application.
+
+ Dropping the original table is a destructive operation, and we want to make sure that we had no
+ issues during the process, so we keep the old non-partitioned table. We also switch the sync trigger
+ the other way around so that the non-partitioned table is still up to date with any operations
+ happening on the partitioned table. That allows us to swap back the tables if it is necessary.
+
+ [MR with all the necessary details](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/60184)
+
+1. Last step, one milestone after the swap: drop the non-partitioned table
+
+ [Issue with all the necessary details](https://gitlab.com/gitlab-org/gitlab/-/issues/323678)
+
+1. After the non-partitioned table is dropped, we can add a worker to implement the
+ pruning strategy by dropping past partitions.
+
+ In this case, the worker will make sure that only 4 partitions are always active (as the
+ retention policy is 90 days) and drop any partitions older than four months. We have to keep 4
+ months of partitions while the current month is still active, as going 90 days back takes you to
+ the fourth oldest partition.
+
+### Audit Events
+
+Related epic: [Partitioning: Design and implement partitioning strategy for Audit Events](https://gitlab.com/groups/gitlab-org/-/epics/3206)
+
+The `audit_events` table shares a lot of characteristics with the `web_hook_logs` table discussed
+in the previous sub-section, so we are going to focus on the points they differ.
+
+The consensus was that
+[partitioning could solve most of the performance issues](https://gitlab.com/groups/gitlab-org/-/epics/3206#note_338157248).
+
+In contrast to most other large tables, it has no major conflicting access patterns: we could switch
+the access patterns to align with partitioning by month. This is not the case for example for other
+tables, which even though could justify a partitioning approach (for example, by namespace), they have many
+conflicting access patterns.
+
+In addition, `audit_events` is a write-heavy table with very few reads (queries) over it and has a
+very simple schema, not connected with the rest of the database (no incoming or outgoing FK
+constraints) and with only two indexes defined over it.
+
+The later was important at the time as not having Foreign Key constraints meant that we could
+partition it while we were still in PostgreSQL 11. *This is not a concern any more now that we have
+moved to PostgreSQL 12 as a required default, as can be seen for the `web_hook_logs` use case above.*
+
+The migrations and steps required for partitioning the `audit_events` are similar to
+the ones described in the previous sub-section for `web_hook_logs`. There is no retention
+strategy defined for `audit_events` at the moment, so there is no pruning strategy
+implemented over it, but we may implement an archiving solution in the future.
+
+What's interesting on the case of `audit_events` is the discussion on the necessary steps that we
+had to follow to implement the UI/UX Changes needed to
+[encourage optimal querying of the partitioned](https://gitlab.com/gitlab-org/gitlab/-/issues/223260).
+It can be used as a starting point on the changes required on the application level
+to align all access patterns with a specific time-decay related access method.
+
+### CI tables
+
+NOTE:
+Requirements and analysis of the CI tables use case: still a work in progress. We intend
+to add more details after the analysis moves forward.