diff options
author | Nick Vatamaniuc <vatamane@gmail.com> | 2022-11-16 22:32:35 -0500 |
---|---|---|
committer | Nick Vatamaniuc <nickva@users.noreply.github.com> | 2022-11-18 17:04:30 -0500 |
commit | 3c24731a5e49bbb4a1d1f407f11f141ca8698e6c (patch) | |
tree | e4107c1c74edec9d8037906c0cf7d78f1f40545e | |
parent | 111f2616e1ef48fadb43dcc358bb8dabb69ad839 (diff) | |
download | couchdb-3c24731a5e49bbb4a1d1f407f11f141ca8698e6c.tar.gz |
Update smoosh documentation
* Remove the state chart. With activated/not-activated state gone, we don't
need it any longer.
* Describe the cleanup channels.
* Add upgrade db and view channel references in a few places.
* Remove references to `external` or `data_size` and other previous compaction
size metrics used for triggering compactions. Replace references with
`active` size.
* Use double back-ticks in a few places instead of single back-ticks due to
differences in RST vs MD. In RST code litterals need double back-ticks.
-rw-r--r-- | src/docs/src/maintenance/compaction.rst | 28 | ||||
-rw-r--r-- | src/smoosh/README.md | 33 | ||||
-rw-r--r-- | src/smoosh/operator_guide.md | 89 | ||||
-rw-r--r-- | src/smoosh/recovery_process_diagram.jpeg | bin | 51388 -> 0 bytes |
4 files changed, 74 insertions, 76 deletions
diff --git a/src/docs/src/maintenance/compaction.rst b/src/docs/src/maintenance/compaction.rst index c15344f11..5f3ff02cc 100644 --- a/src/docs/src/maintenance/compaction.rst +++ b/src/docs/src/maintenance/compaction.rst @@ -93,6 +93,7 @@ configuration setting in the ``[smoosh]`` block. The default configuration is [smoosh] db_channels = upgrade_dbs,ratio_dbs,slack_dbs view_channels = upgrade_views,ratio_views,slack_views + cleanup_channels = index_cleanup [smoosh.ratio_dbs] priority = ratio @@ -110,18 +111,23 @@ configuration setting in the ``[smoosh]`` block. The default configuration is priority = slack min_priority = 536870912 -The "upgrade" channels are a special pair of channels that only check whether -the `disk_format_version` for the file matches the current version, and enqueue -the file for compaction (which has the side effect of upgrading the file format) -if that's not the case. There are several additional properties that can be -configured for each channel; these are documented in the :ref:`configuration API -<config/compactions>` +The "upgrade" and "cleanup_channels" are special system channels. The "upgrade" +ones check whether the ``disk_format_version`` for the file matches the current +version, and enqueue the file for compaction (which has the side effect of +upgrading the file format) if that's not the case. In addition to that, the +``upgrade_views`` will enqueue views for compaction after the collation +(libicu) library is upgraded. The "index_cleanup" channel is used for +scheduling jobs used to remove stale index files and purge _local checkpoint +document after design documents are updated. + +Here are several additional properties that can be configured for each channel; +these are documented in the :ref:`configuration API <config/compactions>` Scheduling Windows ------------------ Each compaction channel can be configured to run only during certain hours of -the day. The channel-specific `from`, `to`, and `strict_window` configuration +the day. The channel-specific ``from``, ``to``, and ``strict_window`` configuration settings control this behavior. For example .. code-block:: ini @@ -131,7 +137,7 @@ settings control this behavior. For example to = 06:00 strict_window = true -where `overnight_channel` is the name of the channel you want to configure. +where ``overnight_channel`` is the name of the channel you want to configure. Note: CouchDB determines time via the UTC (GMT) timezone, so these settings must be expressed as UTC (GMT). @@ -220,9 +226,9 @@ Manual Database Compaction Database compaction compresses the database file by removing unused file sections created during updates. Old documents revisions are replaced with -small amount of metadata called `tombstone` which are used for conflicts +small amount of metadata called ``tombstone`` which are used for conflicts resolution during replication. The number of stored revisions -(and their `tombstones`) can be configured by using the :get:`_revs_limit +(and their ``tombstones``) can be configured by using the :get:`_revs_limit </{db}/_revs_limit>` URL endpoint. Compaction can be manually triggered per database and runs as a background @@ -326,7 +332,7 @@ is actually running. To track the compaction progress you may query the Manual View Compaction ====================== -`Views` also need compaction. Unlike databases, views are compacted by groups +Views also need compaction. Unlike databases, views are compacted by groups per `design document`. To start their compaction, send the HTTP :post:`/{db}/_compact/{ddoc}` request:: diff --git a/src/smoosh/README.md b/src/smoosh/README.md index 9f9a48074..31d111ba3 100644 --- a/src/smoosh/README.md +++ b/src/smoosh/README.md @@ -24,6 +24,8 @@ The main settings one interacts with are: databases. <dt>view_channels<dd>A comma-separated list of channel names for views. +<dt>cleanup_channels<dd>A comma-separated list of channel names +for cleaning old index files. <dt>staleness<dd>The number of minutes that the (expensive) priority calculation can be stale for before it is recalculated. Defaults to 5. </dl> @@ -32,9 +34,7 @@ Sometimes it's necessary to use the following: <dl> <dt>cleanup_index_files</dt><dd>Whether smoosh cleans up the files -for indexes that have been deleted. Defaults to false and probably -shouldn't be changed unless the cluster is running low on disk space, -and only after considering the ramifications.</dd> +for indexes that have been deleted. Defaults to true but may be switched to false.</dd> <dt>wait_secs</dt><dd>The time a channel waits before starting compactions to allow time to observe the system and make a smarter decision about what to compact first. Hardly ever changed from the default. Default 30 (seconds). @@ -68,11 +68,13 @@ properly managed by OTP yet. Compaction Scheduling Algorithm ------------------------------- -Smoosh decides whether to compact a database or view by evaluating the -item against the selection criteria of each _channel_ in the order -they are configured. By default there are two channels for databases -("ratio_dbs" and "slack_dbs"), and two channels for views ("ratio_views" -and "slack_views") +Smoosh decides whether to compact a database or view by evaluating the item +against the selection criteria of each _channel_ in the order they are +configured. By default there are three channels for databases ("ratio_dbs", +"slack_dbs" and "upgrade_dbs"), three channels for views ("ratio_views", +"slack_views" and "upgrade_views"). The "cleanup_channels" has only the +"index_cleanup" channel. That channel is for enqueueing stale index file +cleanup jobs. Smoosh will enqueue the new item to the first channel that accepts it. If none accept it, the item is not enqueued for compaction. @@ -80,18 +82,9 @@ it. If none accept it, the item is not enqueued for compaction. Notes on the data_size value ---------------------------- -Every database and view shard has a data_size value. In CouchDB this -accurately reflects the post-compaction file size. In DbCore, it is -the size of the file that we bill for. It excludes the b+tree and -database footer overhead. We also bill customers for the uncompressed -size of their documents, though we store them compressed on disk. -These two systems were developed independently (ours predates -CouchDB's) and DbCore only calculates the billing size value. - -Because of the way our data_size is currently calculated, it can -sometimes be necessary to enqueue databases and views with very low -ratios. Due to this, it is also currently impossible to tell how -optimally compacted a cluster is. +Every database and view shard has an active size value. In CouchDB this +accurately reflects the post-compaction file size plus the b+tree metadata and +database footer overhead. Example config commands ----------------------- diff --git a/src/smoosh/operator_guide.md b/src/smoosh/operator_guide.md index fafee30d4..4764333bd 100644 --- a/src/smoosh/operator_guide.md +++ b/src/smoosh/operator_guide.md @@ -29,7 +29,7 @@ each node maintains and processes an independent set of compactions. Each channel has a basic type for the algorithm it uses to select pending compactions for its queue and how it prioritises them. -The two queue types are: +There are a few queue types: * **ratio**: this uses the ratio `total_bytes / user_bytes` as its driving calculation. The result _X_ must be greater than some configurable value _Y_ for a @@ -45,14 +45,33 @@ calculation of _X_ is described in [Priority calculation](#priority-calculation) Both algorithms operate on two main measures: -* **user_bytes**: this is the amount of data the user has in the file. It -doesn't include storage overhead: old revisions, on-disk btree structure and -so on. +* **active_bytes**: this is the amount of data used by btree structure and the +document bodies in the leaves of the revision tree of each document. It +includes storage overhead, on-disk btree structure but does not include document +bodies not in leaf nodes. So, for instance, after deleting a document, that +document's body revision will become an intermediate revision tree node and its +size won't be relfected in the **active_bytes** ammount. * **total_bytes**: the size of the file on disk. Channel type is set using the `priority` configuration setting. +There are also a few special "system" channels: + +* **upgrade_dbs** : this is used for enqueuing database shards which need to be + upgraded. This may happen after when Apache CouchDB's data format changes. + +* **upgrade_views** : channels used for enqueuing views which need to be + upgraded. This may happen when view disk format changes, or after operation + system's collation library (libicu) major version upgrade. Then, view shard + will be enqueued for recompaction, so their rows are re-ordered according the + updated rules of the new collation library. + +* **cleanup_channels** : currently there is only a single **index_cleanup** + channel which is used to enqueue jobs used to remove stale view index files + and purge view client checkpoint _local document after design documents get + updated. + #### Further configuration options Beyond its basic type, there are several other configuration options which @@ -86,8 +105,8 @@ currently [here][ss]. #### Background Detail -`user_bytes` is called `data_size` in `db_info` blocks. It is the total of all bytes -that are used to store docs and their attachments. +`user_bytes` is called `sizes.active` in `db_info` blocks. It is the total of all bytes +that are used to store docs and their attachments visible in the leaf nodes of document revision trees. Since `.couch` files are append only, every update adds data to the file. When you update a btree, a new leaf node is written and all the nodes back up the @@ -95,48 +114,15 @@ root. In this update, old data is never overwritten and these parts of the file are no longer live; this includes old btree nodes and document bodies. Compaction takes this file and writes a new file that only contains live data. -`total_data` is the number of bytes in the file as reported by `ls -al filename`. - -#### Flaws - -An important flaw in this calculation is that `total_data` takes into account -the compression of data on disk, whereas `user_bytes` does not. This can give -unexpected results to calculations, as the values are not directly comparable. - -However, it's the best measure we currently have. - -[Even more info](https://github.com/apache/couchdb-smoosh#notes-on-the-data_size-value). - -#### State diagram - -Below is a diagram of smoosh's initial state during the recovery process. - -``` -stateDiagram - [*] --> init - init --> start_recovery: send_after(?START_DELAY_IN_MSEC, self(), start_recovery) - note right of start_recovery - activated = false - paused = true - end note - start_recovery --> activate: send_after(?ACTIVATE_DELAY_IN_MSEC, self(), activate) - note right of activate - state has been recovered - activated = true - paused = true - end note - activate --> schedule_unpause - schedule_unpause --> [*]: after 30 sec, paused = false and compaction of new jobs begin -``` - -![Smoosh State Recovery Process Diagram](recovery_process_diagram.jpeg) +`total_data` is the number of bytes in the file as reported by `ls -al +filename`. In `db_info` reponse this is the `sizes.file` value. ### Defining a channel Defining a channel is done via normal dbcore configuration, with some convention as to the parameter names. -Channel configuration is defined using `smoosh.channel_name` top level config +Channel configuration is defined using `smoosh.$channel_name` top level config options. Defining a channel is just setting the various options you want for the channel, then bringing it into smoosh's sets of active channels by adding it to either `db_channels` or `view_channels`. @@ -152,9 +138,14 @@ It's important to choose good channel names. There are some conventional ones: * `ratio_views`: a ratio channel for views, usually using the default settings. * `slack_views`: a slack channel for views, usually using the default settings. -These four are defined by default if there are no others set ([source][source1]). +These four are defined by default along with three **system** channel: -[source1]: https://github.com/apache/couchdb-smoosh/blob/master/src/smoosh_server.erl#L75 +* `upgrade_dbs`: update channel for dbs, used when db file format changes +* `upgrade_views` : update channel for views, used when view file format + changes or after the operating system's collation library undergoes a major + version change. +* `index_cleanup` : a single channel in the `cleanup_channels` list used for + enqueueing jobs used to clean up stale index files. And some standard names for ones we often have to add: @@ -229,7 +220,6 @@ The same as defining a channel, you just need to set the new value: It sometimes takes a little while to take affect. - ## Standard operating procedures There are a few standard things that operators often have to do when responding @@ -387,6 +377,15 @@ Suspend is currently pretty literal: `erlang:suspend_process(Pid, [unless_suspen is called for each compaction process in each channel. `resume_process` is called for resume. +### Disable a channel + +An alternative to pausing a channel is to disable it by setting its concurrency +value to `"0"`. + +``` +rpc:multicall(config, set, ["smoosh.ratio_dbs", "concurrency", "0"]). +``` + ### Restarting Smoosh Restarting Smoosh is a long shot and is a brute force approach in the hope that diff --git a/src/smoosh/recovery_process_diagram.jpeg b/src/smoosh/recovery_process_diagram.jpeg Binary files differdeleted file mode 100644 index 300db5cd0..000000000 --- a/src/smoosh/recovery_process_diagram.jpeg +++ /dev/null |