summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/tune-compression.dox
blob: 8db2151aa76d9820abdc20bfa1ef2ff227f8898f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/*! @page tune_compression Compression

WiredTiger includes a number of optional compression techniques.  Configuring
compression generally decreases on-disk and in-memory resource requirements
and the amount of I/O, and increases CPU cost when data are read and written.

Configuring compression may change application throughput.  For example,
in applications using solid-state drives (where I/O is less expensive),
turning off compression may increase application performance by reducing
CPU costs; in applications where I/O costs are more expensive, turning on
compression may increase application performance by reducing the overall
number of I/O operations.

An example of turning on row-store key prefix compression:

@snippet ex_all.c Configure key prefix compression on

An example of turning on row-store or column-store dictionary compression:

@snippet ex_all.c Configure dictionary compression on

@section compression_formats Block Compression Formats
WiredTiger provides two methods of compressing your data when using block
compression: the raw and noraw methods. These methods change how WiredTiger
works to fit data into the blocks that are stored on disk.

@subsection noraw_compression Noraw Compression
Noraw compression is the traditional compression model where a fixed
amount of data is given to the compression system, then turned into a
compressed block of data. The amount of data chosen to compress is the
data needed to fill the uncompressed block. Thus when compressed, the block will
be smaller than the normal data size and the sizes written to disk will often
vary depending on how compressible the data being stored is.  Algorithms
using noraw compression include zlib-noraw, lz4-noraw and snappy.

@subsection raw_compression Raw Compression
WiredTiger's raw compression takes advantage of compressors that provide a
streaming compression API. Using the streaming API WiredTiger will try to fit
as much data as possible into one block. This means that blocks created
with raw compression should be of similar size. Using a streaming compression
method should also make for less overhead in compression, as the setup and
initial work for compressing is done fewer times compared to the amount of
data stored.  Algorithms using raw compression include zlib, lz4.

@subsection to_raw_or_noraw Choosing between Raw and Noraw Compression
When looking at which compression method to use the biggest consideration is
that raw compression will normally provide higher compression levels while
using more CPU for compression.

An additional consideration is that raw compression may provide a performance
advantage in workloads where data is accessed sequentially. That is because
more data is generally packed into each block on disk. Conversely, noraw
compression may perform better for workloads with random access patterns
because each block will tend to be smaller and require less work to read and
decompress.

See @ref file_formats_compression for more information on available
compression techniques.

See @ref compression for information on how to configure and enable compression.

 */