diff options
Diffstat (limited to 'doc/development/caching.md')
-rw-r--r-- | doc/development/caching.md | 348 |
1 files changed, 348 insertions, 0 deletions
diff --git a/doc/development/caching.md b/doc/development/caching.md new file mode 100644 index 00000000000..20847832e37 --- /dev/null +++ b/doc/development/caching.md @@ -0,0 +1,348 @@ +--- +stage: none +group: unassigned +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +--- + +# Caching guidelines + +This document describes the various caching strategies in use at GitLab, how to implement +them effectively, and various gotchas. This material was extracted from the excellent +[Caching Workshop](https://gitlab.com/gitlab-org/create-stage/-/issues/12820). + +## What is a cache? + +A faster store for data, which is: + +- Used in many areas of computing. + - Processors have caches, hard disks have caches, lots of things have caches! +- Often closer to where you want the data to finally end up. +- A simpler store for data. +- Temporary. + +## What is fast? + +The goal for every web page should be to return in under 100ms: + +- This is achievable, but you need caching on a modern application. +- Larger responses take longer to build, and caching becomes critical to maintaining a constant speed. +- Cache reads are typically sub-1ms. There is very little that this doesn't improve. +- It's no good only being fast on subsequent page loads, as the initial experience + is important too, so this isn't a complete solution. +- User-specific data makes this challenging, and presents the biggest challenge + in refactoring existing applications to meet this speed goal. +- User-specific caches can still be effective but they just result in fewer cache + hits than generic caches shared between users. +- We're aiming to always have a majority of a page load pulled from the cache. + +## Why use a cache? + +- To make things faster! +- To avoid IO. + - Disk reads. + - Database queries. + - Network requests. +- To avoid recalculation of the same result multiple times: + - View rendering. + - JSON rendering. + - Markdown rendering. +- To provide redundancy. In some cases, caching can help disguise failures elsewhere, + such as CloudFlare's "Always Online" feature +- To reduce memory consumption. Processing less in Ruby but just fetching big strings +- To save money. Especially true in cloud computing, where processors are expensive compared to RAM. + +## Doubts about caching + +- Some engineers are opposed to caching except as a last resort, considering it to + be a hack, and that the real solution is to improve the underlying code to be faster. +- This is could be fed by fear of cache expiry, which is understandable. +- But caching is _still faster_. +- You must use both techniques to achieve true performance: + - There's no point caching if the initial cold write is so slow it times out, for example. + - But there are few cases where caching isn't a performance boost. +- However, you can totally use caching as a quick hack, and that's cool too. + Sometimes the "real" fix takes months, and caching takes only a day to implement. + +### Caching at GitLab + +Despite downsides to Redis caching, you should still feel free to make good use of the +caching setup inside the GitLab application and on GitLab.com. Our +[forecasting for cache utilization](https://gitlab-com.gitlab.io/gl-infra/tamland/saturation.html) +indicates we have plenty of headroom. + +## Workflow + +## Methodology + +1. Cache as close to your final user as possible. as often as possible. + - Caching your view rendering is by far the best performance improvement. +1. Try to cache as much data for as many users as possible: + - Generic data can be cached for everyone. + - You must keep this in mind when building new features. +1. Try to preserve cache data as much as possible: + - Use nested caches to maintain as much cached data as possible across expiries. +1. Perform as few requests to the cache as possible: + - This reduces variable latency caused by network issues. + - Lower overhead for each read on the cache. + +### Identify what benefits from caching + +Is the cache being added "worthy"? This can be hard to measure, but you can consider: + +- How large is the cached data? + - This might affect what type of cache storage you should use, such as storing + large HTML responses on disk rather than in RAM. +- How much I/O, CPU, and response time is saved by caching the data? + - If your cached data is large but the time taken to render it is low, such as + dumping a big chunk of text into the page, this might indicate the best place to cache it. +- How often is this data accessed? + - Caching frequently-accessed data usually has a greater effect. +- How often does this data change? + - If the cache rotates before the cache is read again, is this cache actually useful? + +### Tools + +#### Investigation + +- The performance bar is your first step when investigating locally and in production. + Look for expensive queries, excessive Redis calls, etc. +- Generate a flamegraph: add `?performance_bar=flamegraph` to the URL to help find + the methods where time is being spent. +- Dive into the Rails logs: + - Look closely at render times of partials too. + - To measure the response time alone, you can parse the JSON logs using `jq`: + - `tail -f log/development_json.log | jq ".duration_s"` + - `tail -f log/api_json.log | jq ".duration_s"` + - Some pointers for items to watch when you tail `development.log`: + - `tail -f log/development.log | grep "cache hits"` + - `tail -f log/development.log | grep "Rendered "` +- After you're looking in the right place: + - Remove or comment out sections of code until you find the cause. + - Use `binding.pry` to poke about in live requests. This requires a foreground + web process like [Thin](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/pry.md). + +#### Verification + +- Grafana, in particular the following dashboards: + - [`api: Rails Controller`](https://dashboards.gitlab.net/d/api-rails-controller/api-rails-controller?orgId=1) + - [`web: Rails Controller`](https://dashboards.gitlab.net/d/web-rails-controller/web-rails-controller?orgId=1) + - [`redis-cache: Overview`](https://dashboards.gitlab.net/d/redis-cache-main/redis-cache-overview?orgId=1) +- Logs + - For situations where Grafana charts don't cover what you need, use Kibana instead. +- Feature flags: + - It's nearly always worth using a feature flag when adding a cache. + - Toggle it on and off and watch the wiggly lines in Grafana. + - Expect response times to go up initially as the caches warm. + - The effect isn't obvious until you're running the flag at 100%. +- Performance bar: + - Use this locally and look for the cache calls in the Redis list. + - Also use this in production to verify your cache keys are what you expect. +- Flamegraphs: + - Append `?performance_bar=flamegraph` to the page + +## Cache levels + +### High level + +- HTTP caching: + - Use ETags and expiry times to instruct browsers to serve their own cached versions. + - This _does_ still hit Rails, but skips the view layer. +- HTTP caching in a reverse proxy cache: + - Same as above, but with a `public` setting. + - Instead of the browser, this instructs a reverse proxy (such as NGINX, HAProxy, Varnish) to serve a cached version. + - Subsequent requests never hit Rails. +- HTML page caching: + - Write a HTML file to disk + - Web server (such as NGINX, Apache, Caddy) serves the HTML file itself, skipping Rails. +- View or action caching + - Rails writes the entire rendered view into its cache store and serves it back. +- Fragment caching: + - Cache parts of a view in the Rails cache store. + - Cached parts are inserted into the view as it renders. + +### Low level + +1. Method caching: + - Calling the same method multiple times but only calculating the value once. + - Stored in Ruby memory. + - `@article ||= Article.find(params[:id])` + - `strong_memoize { Article.find(params[:id]) }` +1. Request caching: + - Return the same value for a key for the duration of a web request. + - `Gitlab::SafeRequestStore.fetch` +1. Read-through or write-through SQL caching: + - Cache sitting in front of the database. + - Rails does this within a request for the same query. +1. Novelty caches. +1. Hyper-specific caches for one use case. + +### Rails' built-in caching helpers + +This is well-documentation in the [Rails guides](https://guides.rubyonrails.org/caching_with_rails.html) + +- HTML page caching and action caching are no longer included by default, but they are still useful. +- The Rails guides call HTTP caching + [Conditional GET](https://guides.rubyonrails.org/caching_with_rails.html#conditional-get-support). +- For Rails' cache store, remember two very important (and almost identical) methods: + - `cache` in views, which is almost an alias for: + - `Rails.cache.fetch`, which you can use everywhere. +- `cache` includes a "template tree digest" which changes when you modify your view files. + +#### Rails cache options + +##### `expires_in` + +This sets the Time To Live (TTL) for the cache entry, and is the single most useful +(and most commonly used) cache option. This is supported in most Rails caching helpers. + +##### `race_condition_ttl` + +This option prevents multiple uncached hits for a key at the same time. +The first process that finds the key expired bumps the TTL by this amount, and it +then sets the new cache value. + +Used when a cache key is under very heavy load to prevent multiple simultaneous +writes, but should be set to a low value, such as 10 seconds. + +### When to use HTTP caching + +Use conditional GET caching when the entire response is cacheable: + +- No privacy risk when you aren't using public caches. You're only caching what + the user sees, for that user, in their browser. +- Particularly useful on [endpoints that get polled](polling.md#polling-with-etag-caching). +- Good examples: + - A list of discussions that we poll for updates. Use the last created entry's `updated_at` value for the `etag`. + - API endpoints. + +#### Possible downsides + +- Users and API libraries can ignore the cache. +- Sometimes Chrome does weird things with caches. +- You will forget it exists in development mode and get angry when your changes aren't appearing. +- In theory using conditional GET caching makes sense everywhere, but in practice it can + sometimes cause odd issues. + +### When to use view or action caching + +This is no longer very commonly used in the Rails world: + +- Support for it was removed from the Rails core. +- Usually better to look at reverse proxy caching or conditional GET responses. +- However it offers a somewhat simple way of emulating HTML page caching without + writing to disk, which makes it useful in cloud environments. +- Stores rather large chunks of markup in the cache store. +- We do have a custom implementation of this available on the API, where it is more + useful, in `cache_action`. + +### When to use fragment caching + +All the time! + +- Probably the most useful caching type to use in Rails, as it allows you to cache sections + of views, entire partials, collections of partials. +- Rendered collections of partials should be engineered with the goal of using + `cached: true` on them. +- It's faster to cache around the render call for a partial than inside the partial, + but then you lose out on the template tree digest, which means the caches don't expire + automatically when you update that partial. +- Beware of introducing lots of cache calls, such as placing a cache call inside a loop. + Sometimes it's unavoidable, but there are options for getting around this, like the partial collection caching. +- View rendering, and JSON generation, are slow, and should be cached wherever possible. + +### When to use method caching + +- Using instance variables, or [strong_memoize](utilities.md#strongmemoize) is something we all tend to do anyway. +- Useful when the same value is needed multiple times in a request. +- Can be used to prevent multiple cache calls for the same key. +- Can cause issues with ActiveRecord objects where a value doesn't change until you call + reload, which tends to crop up in the test suite. + +### When to use request caching + +- Similar usage pattern to method caching but can be used across multiple methods. +- Standardized way of storing something for the duration of a request. +- As the lookup is similar to a cache lookup (in the GitLab implementation), we can use + the same key for both. This is how `Gitlab::Cache.fetch_once` works. + +### When to use SQL caching + +Rails uses this automatically for identical queries in a request, so no action is +needed for that use case. + +- However, using a gem like `identity_cache` has a different purpose: caching queries + across multiple requests. +- Avoid using on single object lookups, like `Article.find(params[:id])`. +- Sometimes it's not possible to use the result, as it provides a read-only object. +- It can also cache relationships, useful in situations where we want to return a + list of things but don't care about filtering or ordering them differently. + +### When to use a novelty cache + +If you've exhausted other options, and must cache something that's really awkward, +it's time to look at a custom solution: + +- Examples in GitLab include `RepositorySetCache`, `RepositoryHashCache` and `AvatarCache`. +- Where possible, you should avoid creating custom cache implementations as it adds + inconsistency. +- Can be extremely effective. For example, the caching around `merged_branch_names`, + using [RepositoryHashCache](https://gitlab.com/gitlab-org/gitlab/-/issues/30536#note_290824711). + +## Cache expiration + +### How Redis expires keys + +In short: the oldest stuff is replaced with new stuff: + +- A [useful article](https://redis.io/topics/lru-cache) about configuring Redis as an LRU cache. +- Lots of options for different cache eviction strategies. +- You probably want `allkeys-lru`, which is functionally similar to Memcached. +- In Redis 4.0 and later, [allkeys-lfu is available](https://redis.io/topics/lru-cache#the-new-lfu-mode), + which is similar but different. +- We handle all explicit deletes using UNLINK instead of DEL now, which allows Redis to + reclaim memory in its own time, rather than immediately. + - This marks a key as deleted and returns a successful value quickly, + but actually deletes it later. + +### How Rails expires keys + +- Rails prefers using TTL and cache key expiry to using explicit deletes. +- Cache keys include a template tree digest by default when fragment caching in + views, which ensure any changes to the template automatically expire the cache. + - This isn't true in helpers, though, as a warning. +- Rails has two cache key methods on ActiveRecord objects: `cache_key_with_version` and `cache_key`. + The first one is used by default in version 5.2 and later, and is the standard behavior from before; + it includes the `updated_at` timestamp in the key. + +#### Cache key components + +Example found in the `application.log`: + +```plaintext +cache(@project, :tag_list) +views/projects/_home_panel:462ad2485d7d6957e03ceba2c6717c29/projects/16-2021031614242546945 +2/tag_list +``` + +1. The view name and template tree digest + `views/projects/_home_panel:462ad2485d7d6957e03ceba2c6717c29` +1. The model name, ID, and `updated_at` values + `projects/16-20210316142425469452` +1. The symbol we passed in, converted to a string + `tag_list` + +### Look for + +- User-specific data + - This is the most important! + - This isn't always obvious, particularly in views. + - You must trawl every helper method that's used in the area you want to cache. +- Time-specific data, such as "Billy posted this 8 minutes ago". +- Records being updated but not triggering the `updated_at` field to change +- Rails helpers roll the template digest into the keys in views, but this doesn't happen elsewhere, such as in helpers. +- `Grape::Entity` makes effective caching extremely difficult in the API layer. More on this later. +- Don't use `break` or `return` inside the fragment cache helper in views - it never writes a cache entry. +- Reordering items in a cache key that could return old data: + - such as having two values that could return `nil` and swapping them around. + - Use hashes, like `{ project: nil }` instead. +- Rails calls `#cache_key` on members of an array to find the keys, but it doesn't call it on values of hashes. |