# An operator's guide to smoosh Smoosh is the auto-compactor for the databases. It automatically selects and processes the compacting of database shards on each node. ## Smoosh Channels Smoosh works using the concept of channels. A channel is essentially a queue of pending compactions. There are separate sets of channels for database and view compactions. Each channel is assigned a configuration which defines whether a compaction ends up in the channel's queue and how compactions are prioritised within that queue. Smoosh takes each channel and works through the compactions queued in each in priority order. Each channel is processed concurrently, so the priority levels only matter within a given channel. Finally, each channel has an assigned number of active compactions, which defines how many compactions happen for that channel in parallel. For example, a cluster with a lot of database churn but few views might require more active compactions to the database channel(s). It's important to remember that a channel is local to a dbcore node, that is each node maintains and processes an independent set of compactions. ### Channel configuration options #### Channel types Each channel has a basic type for the algorithm it uses to select pending compactions for its queue and how it prioritises them. There are a few queue types: * **ratio**: this uses the ratio `total_bytes / user_bytes` as its driving calculation. The result _X_ must be greater than some configurable value _Y_ for a compaction to be added to the queue. Compactions are then prioritised for higher values of _X_. * **slack**: this uses `total_bytes - user_bytes` as its driving calculation. The result _X_ must be greater than some configurable value _Y_ for a compaction to be added to the queue. Compactions are prioritised for higher values of _X_. In both cases, _Y_ is set using the `min_priority` configuration variable. The calculation of _X_ is described in [Priority calculation](#priority-calculation), below. Both algorithms operate on two main measures: * **active_bytes**: this is the amount of data used by btree structure and the document bodies in the leaves of the revision tree of each document. It includes storage overhead, on-disk btree structure but does not include document bodies not in leaf nodes. So, for instance, after deleting a document, that document's body revision will become an intermediate revision tree node and its size won't be relfected in the **active_bytes** ammount. * **total_bytes**: the size of the file on disk. Channel type is set using the `priority` configuration setting. There are also a few special "system" channels: * **upgrade_dbs** : this is used for enqueuing database shards which need to be upgraded. This may happen after when Apache CouchDB's data format changes. * **upgrade_views** : channels used for enqueuing views which need to be upgraded. This may happen when view disk format changes, or after operation system's collation library (libicu) major version upgrade. Then, view shard will be enqueued for recompaction, so their rows are re-ordered according the updated rules of the new collation library. * **cleanup_channels** : currently there is only a single **index_cleanup** channel which is used to enqueue jobs used to remove stale view index files and purge view client checkpoint _local document after design documents get updated. #### Further configuration options Beyond its basic type, there are several other configuration options which can be applied to a queue. *All options MUST be set as strings.* See the [smoosh readme][srconfig] for all settings and their defaults. #### Priority calculation The algorithm type and certain configuration options feed into the priority calculation. The priority is calculated when a compaction is enqueued. As each channel has a different configuration, each channel will end up with a different priority value. The enqueue code checks each channel in turn to see whether the compaction passes its configured priority threshold (`min_priority`). Once a channel is found that can accept the compaction, the compaction is added to that channel's queue and the enqueue process stops. Therefore the ordering of channels has a bearing in what channel a compaction ends up in. If you want to follow this along, the call order is all in `smoosh_server`, `enqueue_request -> find_channel -> get_priority`. The priority calculation is probably the easiest way to understand the effects of configuration variables. It's defined in `smoosh_server#get_priority/3`, currently [here][ss]. [ss]: https://github.com/apache/couchdb-smoosh/blob/master/src/smoosh_server.erl#L277 [srconfig]: https://github.com/apache/couchdb-smoosh#channel-settings #### Background Detail `user_bytes` is called `sizes.active` in `db_info` blocks. It is the total of all bytes that are used to store docs and their attachments visible in the leaf nodes of document revision trees. Since `.couch` files are append only, every update adds data to the file. When you update a btree, a new leaf node is written and all the nodes back up the root. In this update, old data is never overwritten and these parts of the file are no longer live; this includes old btree nodes and document bodies. Compaction takes this file and writes a new file that only contains live data. `total_data` is the number of bytes in the file as reported by `ls -al filename`. In `db_info` response this is the `sizes.file` value. ### Defining a channel Defining a channel is done via normal dbcore configuration, with some convention as to the parameter names. Channel configuration is defined using `smoosh.{channel-name}` top level config options. Defining a channel is just setting the various options you want for the channel, then bringing it into smoosh's sets of active channels by adding it to either `db_channels` or `view_channels`. This means that smoosh channels can be defined either for a single node or globally across a cluster, by setting the configuration either globally or locally. In the example, we set up a new global channel. It's important to choose good channel names. There are some conventional ones: * `ratio_dbs`: a ratio channel for dbs, usually using the default settings. * `slack_dbs`: a slack channel for dbs, usually using the default settings. * `ratio_views`: a ratio channel for views, usually using the default settings. * `slack_views`: a slack channel for views, usually using the default settings. These four are defined by default along with three **system** channel: * `upgrade_dbs`: update channel for dbs, used when db file format changes * `upgrade_views` : update channel for views, used when view file format changes or after the operating system's collation library undergoes a major version change. * `index_cleanup` : a single channel in the `cleanup_channels` list used for enqueueing jobs used to clean up stale index files. And some standard names for ones we often have to add: * `big_dbs`: a ratio channel for only enqueuing large database shards. What _large_ means is very workload specific. Channels have certain defaults for their configuration, defined in the [smoosh readme][srconfig]. It's only neccessary to set up how this channel differs from those defaults. Below, we just need to set the `min_size` and `concurrency` settings, and allow the `priority` to default to `ratio` along with the other defaults. ```bash # Define the new channel (couchdb@db1.foo.bar)3> rpc:multicall(config, set, ["smoosh.big_dbs", "min_size", "20000000000"]). {[ok,ok,ok],[]} (couchdb@db1.foo.bar)3> rpc:multicall(config, set, ["smoosh.big_dbs", "concurrency", "2"]). {[ok,ok,ok],[]} # Add the channel to the db_channels set -- note we need to get the original # value first so we can add the new one to the existing list! (couchdb@db1.foo.bar)5> rpc:multicall(config, get, ["smoosh", "db_channels"]). {["ratio_dbs","ratio_dbs","ratio_dbs"],[]} (couchdb@db1.foo.bar)6> rpc:multicall(config, set, ["smoosh", "db_channels", "ratio_dbs,big_dbs"]). {[ok,ok,ok],[]} ``` ### Viewing active channels ```bash (couchdb@db3.foo.bar)3> rpc:multicall(config, get, ["smoosh", "db_channels"]). {["ratio_dbs,big_dbs","ratio_dbs,big_dbs","ratio_dbs,big_dbs"],[]} (couchdb@db3.foo.bar)4> rpc:multicall(config, get, ["smoosh", "view_channels"]). {["ratio_views","ratio_views","ratio_views"],[]} ``` ### Removing a channel ```bash # Remove it from the active set (couchdb@db1.foo.bar)5> rpc:multicall(config, get, ["smoosh", "db_channels"]). {["ratio_dbs,big_dbs", "ratio_dbs,big_dbs", "ratio_dbs,big_dbs"],[]} (couchdb@db1.foo.bar)6> rpc:multicall(config, set, ["smoosh", "db_channels", "ratio_dbs"]). {[ok,ok,ok],[]} # Delete the config -- you need to do each value (couchdb@db1.foo.bar)3> rpc:multicall(config, delete, ["smoosh.big_dbs", "concurrency"]). {[ok,ok,ok],[]} (couchdb@db1.foo.bar)3> rpc:multicall(config, delete, ["smoosh.big_dbs", "min_size"]). {[ok,ok,ok],[]} ``` ### Getting channel configuration As far as I know, you have to get each setting separately: ``` (couchdb@db1.foo.bar)1> rpc:multicall(config, get, ["smoosh.big_dbs", "concurrency"]). {["2","2","2"],[]} ``` ### Setting channel configuration The same as defining a channel, you just need to set the new value: ``` (couchdb@db1.foo.bar)2> rpc:multicall(config, set, ["smoosh.ratio_dbs", "concurrency", "1"]). {[ok,ok,ok],[]} ``` It sometimes takes a little while to take affect. ## Standard operating procedures There are a few standard things that operators often have to do when responding to pages. In addition to the below, in some circumstances it's useful to define new channels with certain properties (`big_dbs` is a common one) if smoosh isn't selecting and prioritising compactions that well. ### Checking smoosh's status You can see the queued items for each channel by going into `remsh` on a node and using: ``` > smoosh:status(). {ok,[{"ratio_dbs", [{active,1}, {starting,0}, {waiting,[{size,522}, {min,{5.001569007970237,{1378,394651,323864}}}, {max,{981756.5441159063,{1380,370286,655752}}}]}]}, {"slack_views", [{active,1}, {starting,0}, {waiting,[{size,819}, {min,{16839814,{1375,978920,326458}}}, {max,{1541336279,{1380,370205,709896}}}]}]}, {"slack_dbs", [{active,1}, {starting,0}, {waiting,[{size,286}, {min,{19004944,{1380,295245,887295}}}, {max,{48770817098,{1380,370185,876596}}}]}]}, {"ratio_views", [{active,1}, {starting,0}, {waiting,[{size,639}, {min,{5.0126340031149335,{1380,186581,445489}}}, {max,{10275.555632057285,{1380,370411,421477}}}]}]}]} ``` This gives you the node-local status for each queue. Under each channel there is some information about the channel: * `active`: number of current compactions in the channel. * `starting`: number of compactions starting-up. * `waiting`: number of queued compactions. * `min` and `max` give an idea of the queued jobs' effectiveness. The values for these are obviously dependent on whether the queue is ratio or slack. For ratio queues, the default minimum for smoosh to enqueue a compaction is 5. In the example above, we can guess that 981,756 is quite high. This could be a small database, however, so it doesn't necessarily mean useful compactions from the point of view of reclaiming disk space. For this example, we can see that there are quite a lot of queued compactions, but we don't know which would be most effective to run to reclaim disk space. It's also worth noting that the waiting queue sizes are only meaningful related to other factors on the cluster (e.g., db number and size). ### Smoosh IOQ priority This is a global setting which affects all channels. Increasing it allows each active compaction to (hopefully) proceed faster as the compaction work is of a higher priority relative to other jobs. Decreasing it (hopefully) has the converse effect. By this point you'll [know whether smoosh is backing up](#checking-smooshs-status). If it's falling behind (big queues), try increasing compaction priority. Smoosh's IOQ priority is controlled via the `ioq` -> `compaction` queue. ``` > rpc:multicall(config, get, ["ioq", "compaction"]). {[undefined,undefined,undefined],[]} ``` Priority by convention runs 0 to 1, though the priority can be any positive number. The default for compaction is 0.01; pretty low. If it looks like smoosh has a bunch of work that it's not getting through, priority can be increased. However, be careful that this doesn't adversely impact the customer experience. If it will, and it's urgent, at least drop them a warning. ``` > rpc:multicall(config, set, ["ioq", "compaction", "0.5"]). {[ok,ok,ok],[]} ``` In general, this should be a temporary measure. For some clusters, a change from the default may be required to help smoosh keep up with particular workloads. ### Granting specific channels more workers Giving smoosh a higher concurrency for a given channel can allow a backlog in that channel to catch up. Again, some clusters run best with specific channels having more workers. From [assessing disk space](#assess-the-space-on-the-disk), you should know whether the biggest offenders are db or view files. From this, you can infer whether it's worth giving a specific smoosh channel a higher concurrency. The current setting can be seen for a channel like so: ``` > rpc:multicall(config, get, ["smoosh.ratio_dbs", "concurrency"]). {["2","2","2"], []} ``` `undefined` means the default is used. If we knew that disk space for DBs was the major user of disk space, we might want to increase a `_dbs` channel. Experience shows `ratio_dbs` is often best but evaluate this based on the current status. If we want to increase the ratio_dbs setting: ``` > rpc:multicall(config, set, ["smoosh.ratio_dbs", "concurrency", "2"]). {[ok,ok,ok],[]} ``` ### Suspending smoosh If smoosh itself is causing issues, it's possible to suspend its operation. This differs from either `application:stop(smoosh).` or setting all channel's concurrency to zero because it both pauses on going compactions and maintains the channel queues intact. If, for example, a node's compactions are causing disk space issues, smoosh could be suspended while working out which channel is causing the problem. For example, a big_dbs channel might be creating huge compaction-in-progress files if there's not much in the shard to compact away. It's therefore useful to use when testing to see if smoosh is causing a problem. ``` # suspend smoosh:suspend(). # resume a suspended smoosh smoosh:resume(). ``` Suspend is currently pretty literal: `erlang:suspend_process(Pid, [unless_suspending])` is called for each compaction process in each channel. `resume_process` is called for resume. ### Disable a channel An alternative to pausing a channel is to disable it by setting its concurrency value to `"0"`. ``` rpc:multicall(config, set, ["smoosh.ratio_dbs", "concurrency", "0"]). ``` ### Restarting Smoosh Restarting Smoosh is a long shot and is a brute force approach in the hope that when Smoosh rescans the DBs that it makes the right decisions. If required to take this step contact rnewson or davisp so that they can inspect Smoosh and see the bug. ``` > exit(whereis(smoosh_server), kill), smoosh:enqueue_all_dbs(), smoosh:enqueue_all_views(). ```