diff options
Diffstat (limited to 'doc/kernel/cache-policies.txt')
-rw-r--r-- | doc/kernel/cache-policies.txt | 102 |
1 files changed, 63 insertions, 39 deletions
diff --git a/doc/kernel/cache-policies.txt b/doc/kernel/cache-policies.txt index 0d124a971..d3ca8af21 100644 --- a/doc/kernel/cache-policies.txt +++ b/doc/kernel/cache-policies.txt @@ -11,7 +11,7 @@ Every bio that is mapped by the target is referred to the policy. The policy can return a simple HIT or MISS or issue a migration. Currently there's no way for the policy to issue background work, -e.g. to start writing back dirty blocks that are going to be evicte +e.g. to start writing back dirty blocks that are going to be evicted soon. Because we map bios, rather than requests it's easy for the policy @@ -25,53 +25,77 @@ trying to see when the io scheduler has let the ios run. Overview of supplied cache replacement policies =============================================== -multiqueue ----------- +multiqueue (mq) +--------------- -This policy is the default. - -The multiqueue policy has three sets of 16 queues: one set for entries -waiting for the cache and another two for those in the cache (a set for -clean entries and a set for dirty entries). +This policy is now an alias for smq (see below). -Cache entries in the queues are aged based on logical time. Entry into -the cache is based on variable thresholds and queue selection is based -on hit count on entry. The policy aims to take different cache miss -costs into account and to adjust to varying load patterns automatically. +The following tunables are accepted, but have no effect: -Message and constructor argument pairs are: 'sequential_threshold <#nr_sequential_ios>' 'random_threshold <#nr_random_ios>' 'read_promote_adjustment <value>' 'write_promote_adjustment <value>' 'discard_promote_adjustment <value>' -The sequential threshold indicates the number of contiguous I/Os -required before a stream is treated as sequential. Once a stream is -considered sequential it will bypass the cache. The random threshold -is the number of intervening non-contiguous I/Os that must be seen -before the stream is treated as random again. - -The sequential and random thresholds default to 512 and 4 respectively. - -Large, sequential I/Os are probably better left on the origin device -since spindles tend to have good sequential I/O bandwidth. The -io_tracker counts contiguous I/Os to try to spot when the I/O is in one -of these sequential modes. But there are use-cases for wanting to -promote sequential blocks to the cache (e.g. fast application startup). -If sequential threshold is set to 0 the sequential I/O detection is -disabled and sequential I/O will no longer implicitly bypass the cache. -Setting the random threshold to 0 does _not_ disable the random I/O -stream detection. - -Internally the mq policy determines a promotion threshold. If the hit -count of a block not in the cache goes above this threshold it gets -promoted to the cache. The read, write and discard promote adjustment -tunables allow you to tweak the promotion threshold by adding a small -value based on the io type. They default to 4, 8 and 1 respectively. -If you're trying to quickly warm a new cache device you may wish to -reduce these to encourage promotion. Remember to switch them back to -their defaults after the cache fills though. +Stochastic multiqueue (smq) +--------------------------- + +This policy is the default. + +The stochastic multi-queue (smq) policy addresses some of the problems +with the multiqueue (mq) policy. + +The smq policy (vs mq) offers the promise of less memory utilization, +improved performance and increased adaptability in the face of changing +workloads. smq also does not have any cumbersome tuning knobs. + +Users may switch from "mq" to "smq" simply by appropriately reloading a +DM table that is using the cache target. Doing so will cause all of the +mq policy's hints to be dropped. Also, performance of the cache may +degrade slightly until smq recalculates the origin device's hotspots +that should be cached. + +Memory usage: +The mq policy used a lot of memory; 88 bytes per cache block on a 64 +bit machine. + +smq uses 28bit indexes to implement it's data structures rather than +pointers. It avoids storing an explicit hit count for each block. It +has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of +the entries (each hotspot block covers a larger area than a single +cache block). + +All this means smq uses ~25bytes per cache block. Still a lot of +memory, but a substantial improvement nontheless. + +Level balancing: +mq placed entries in different levels of the multiqueue structures +based on their hit count (~ln(hit count)). This meant the bottom +levels generally had the most entries, and the top ones had very +few. Having unbalanced levels like this reduced the efficacy of the +multiqueue. + +smq does not maintain a hit count, instead it swaps hit entries with +the least recently used entry from the level above. The overall +ordering being a side effect of this stochastic process. With this +scheme we can decide how many entries occupy each multiqueue level, +resulting in better promotion/demotion decisions. + +Adaptability: +The mq policy maintained a hit count for each cache block. For a +different block to get promoted to the cache it's hit count has to +exceed the lowest currently in the cache. This meant it could take a +long time for the cache to adapt between varying IO patterns. + +smq doesn't maintain hit counts, so a lot of this problem just goes +away. In addition it tracks performance of the hotspot queue, which +is used to decide which blocks to promote. If the hotspot queue is +performing badly then it starts moving entries more quickly between +levels. This lets it adapt to new IO patterns very quickly. + +Performance: +Testing smq shows substantially better performance than mq. cleaner ------- |