summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/tune-page-sizes.dox
diff options
context:
space:
mode:
Diffstat (limited to 'src/third_party/wiredtiger/src/docs/tune-page-sizes.dox')
-rw-r--r--src/third_party/wiredtiger/src/docs/tune-page-sizes.dox142
1 files changed, 142 insertions, 0 deletions
diff --git a/src/third_party/wiredtiger/src/docs/tune-page-sizes.dox b/src/third_party/wiredtiger/src/docs/tune-page-sizes.dox
new file mode 100644
index 00000000000..130e047a02d
--- /dev/null
+++ b/src/third_party/wiredtiger/src/docs/tune-page-sizes.dox
@@ -0,0 +1,142 @@
+/*! @page tune_page_sizes Page and overflow key/value sizes
+
+There are seven page and key/value size configuration strings:
+
+- allocation size (\c allocation_size),
+- page sizes (\c internal_page_max and \c leaf_page_max),
+- key and value sizes (\c internal_key_max, \c leaf_key_max and \c leaf_value_max), and the
+- page-split percentage (\c split_pct).
+
+All seven are specified to the WT_SESSION::create method, in other
+words, they are configurable on a per-file basis.
+
+Applications commonly configure page sizes, based on their workload's
+typical key and value size. Once the correct page size has been chosen,
+appropriate defaults for the other configuration values are derived from
+the page sizes, and relatively few applications will need to modify the
+other page and key/value size configuration options.
+
+An example of configuring page and key/value sizes:
+
+@snippet ex_all.c Create a table and configure the page size
+
+@section tune_page_sizes_sizes Page, key and value sizes
+
+The \c internal_page_max and \c leaf_page_max configuration values
+specify a maximum size for Btree internal and leaf pages. That is, when
+an internal or leaf page grows past that size, it splits into multiple
+pages. Generally, internal pages should be sized to fit into on-chip
+caches in order to minimize cache misses when searching the tree, while
+leaf pages should be sized to maximize I/O performance (if reading from
+disk is necessary, it is usually desirable to read a large amount of
+data, assuming some locality of reference in the application's access
+pattern).
+
+The default page size configurations (2KB for \c internal_page_max, 32KB
+for \c leaf_page_max), are appropriate for applications with relatively
+small keys and values.
+
+- Applications doing full-table scans through out-of-memory workloads
+might increase both internal and leaf page sizes to transfer more data
+per I/O.
+- Applications focused on read/write amplification might decrease the page
+size to better match the underlying storage block size.
+
+When block compression has been configured, configured page sizes will
+not match the actual size of the page on disk. Block compression in
+WiredTiger happens within the I/O subsystem, and so a page might split
+even if subsequent compression would result in a resulting page size
+small enough to leave as a single page. In other words, page sizes are
+based on in-memory sizes, not on-disk sizes. Applications needing to
+write specific sized blocks may want to consider implementing a
+WT_COMPRESSOR::compress_raw function.
+
+The page sizes also determine the default size of overflow items, that
+is, keys and values too large to easily store on a page. Overflow items
+are stored separately in the file from the page where the item logically
+appears, and so reading or writing an overflow item is more expensive
+than an on-page item, normally requiring additional I/O. Additionally,
+overflow values are not cached in memory. This means overflow items
+won't affect the caching behavior of the application, but it also means
+that each time an overflow value is read, it is re-read from disk.
+
+For both of these reasons, applications should avoid creating large
+numbers of commonly referenced overflow items. This is especially
+important for keys, as keys on internal pages are referenced during
+random searches, not just during data retrieval. Generally,
+applications should make every attempt to avoid creating overflow keys.
+
+- Applications with large keys and values, and concerned with latency,
+might increase the page size to avoid creating overflow items, in order
+to avoid the additional cost of retrieving them.
+
+- Applications with large keys and values, doing random searches, might
+decrease the page size to avoid wasting cache space on overflow items
+that aren't likely to be needed.
+
+- Applications with large keys and values, doing table scans, might
+increase the page size to avoid creating overflow items, as the overflow
+items must be read into memory in all cases, anyway.
+
+The \c internal_key_max, \c leaf_key_max and \c leaf_value_max
+configuration values allow applications to change the size at which a
+key or value will be treated as an overflow item.
+
+The value of \c internal_key_max is relative to the maximum internal
+page size. Because the number of keys on an internal page determines
+the depth of the tree, the \c internal_key_max value can only be
+adjusted within a certain range, and the configured value will be
+automatically adjusted by WiredTiger, if necessary to ensure a
+reasonable number of keys fit on an internal page.
+
+The values of \c leaf_key_max and \c leaf_value_max are not relative to
+the maximum leaf page size. If either is larger than the maximum page
+size, the page size will be ignored when the larger keys and values are
+being written, and a larger page will be created as necessary.
+
+Most applications should not need to tune the maximum key and value
+sizes. Applications requiring a small page size, but also having
+latency concerns such that the additional work to retrieve an overflow
+item is an issue, may find them useful.
+
+An example of configuring a large leaf overflow value:
+
+@snippet ex_all.c Create a table and configure a large leaf value max
+
+@section tune_page_sizes_split_percentage Split percentage
+
+The \c split_pct configuration string configures the size of a split
+page. When a page grows sufficiently large that it must be written as
+multiple disk blocks, the newly written block size is \c split_pct
+percent of the maximum page size. This value should be selected to
+avoid creating a large number of tiny pages or repeatedly splitting
+whenever new entries are inserted. For example, if the maximum page
+size is 1MB, a \c split_pct value of 10% would potentially result in
+creating a large number of 100KB pages, which may not be optimal for
+future I/O. Or, if the maximum page size is 1MB, a \c split_pct value
+of 90% would potentially result in repeatedly splitting pages as the
+split pages grow to 1MB over and over. The default value for \c
+split_pct is 75%, intended to keep large pages relatively large, while
+still giving split pages room to grow.
+
+Most applications should not need to tune the split percentage size.
+
+@section tune_page_sizes_allocation_size Allocation size
+
+The \c allocation_size configuration value is the underlying unit of
+allocation for the file. As the unit of file allocation, it sets the
+minimum page size and how much space is wasted when storing small
+amounts of data and overflow items. For example, if the allocation size
+is set to 4KB, an overflow item of 18,000 bytes requires 5 allocation
+units and wastes about 2KB of space. If the allocation size is 16KB,
+the same overflow item would waste more than 10KB.
+
+The default allocation size is 4KB, chosen for compatibility with
+virtual memory page sizes and direct I/O requirements on common server
+platforms.
+
+Most applications should not need to tune the allocation size; it is
+primarily intended for applications coping with the specific
+requirements some file systems make to support features like direct I/O.
+
+*/