summaryrefslogtreecommitdiff
path: root/docs/src/huffman.dox
blob: 3365b68411e7818f7b732ff10909ff894d202274 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/*! @page huffman Huffman Encoding

Keys in row-stores and variable-length values in either row- or
column-stores can be compressed with Huffman encoding.

Huffman compression is maintained in memory as well as on disk, and can
increase the amount of usable data the cache can hold as well as
decrease the size of the data on disk.

To specify Huffman encoding for the key in a row-store, specify \c
"btree_huffman_key=english" or \c "btree_huffman_key=<file>" in the
configuration passed to \c WT_SESSION::create.

To specify Huffman encoding for a variable-length value in either a
row-store or a column-store, specify "btree_huffman_value=english"
or \c "btree_huffman_value=<file>" in the configuration passed to
\c WT_SESSION::create.

Setting Huffman encoding to \c "english" configures WiredTiger to use
a built-in English language frequency table.

Setting Huffman encoding to \c "<file>" configures WiredTiger to use the
frequency table read from the specified file.  The format of the
frequency table file is lines containing pairs of unsigned integers
separated by whitespace.  The first integer is the symbol value, the
second integer is the frequency value.  Symbol values may be specified
as hexadecimal numbers (with a leading \c "0x" prefix), or as integers.
Symbol values must be unique and in the range of 0 to 65,535.  Frequency
values do not need to be unique, but must be in the range of 0 to the
maximum 32-bit unsigned integer value (4,294,967,295), where the lower
the frequency value, the less likely the byte value is to occur.   Any
unspecified symbol values are assumed to have frequencies of 0.

Input containing symbol values that did not appear in the frequency
table (or appeared in the frequency table, but with frequency values of
0), are accepted, although will not compress as well as if they are
listed in the frequency table, with frequency values other than 0.

*/