summaryrefslogtreecommitdiff
path: root/docs/src/file-formats.dox
blob: 538709673d46376868c8a1d289fee335cce1eeab (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
/*! @page file_formats WiredTiger files

@section formats File formats

WiredTiger supports two underlying file formats: row-store and
column-store.  WiredTiger row- and column-stores are both key/value
stores.

In a row-store, both keys and data are variable-length byte strings.  In
a column-store, the key is a 64-bit record number, and the data item is
either a variable- or fixed-length byte string.

Generally, row-stores are faster for queries where a set of columns are
required by every lookup (because there's only a single set of meta-data
pages to go through, or read into the cache).  Column-stores are faster
for queries where only a few of the columns are required for any lookup
(because only the columns being returned are present in the cache).

Row-stores support three types of compression: prefix compression (where
any identical portion of each key byte string is only stored once),
Huffman encoding of individual key/value items, (see @subpage huffman
for details), and stream compression of the blocks in the file (see @ref
compression for details).

Unlike some row-stores, WiredTiger does not support duplicate data
items, that is, for any single key, there can be only a single value,
and applications are responsible for creating unique key/value pairs.

Column-stores with variable-length byte string values support three
types of compression: run-length encoding (where duplicate values are
only stored a single time), Huffman encoding of individual value items,
(see @ref huffman for details), and stream compression of the blocks in
the file (see @ref compression for details).

Column-stores with fixed-length byte values support a single type of
compression: stream compression of the blocks in the file (see @ref
compression for details).

In row-stores, keys and values too large to fit on a normal page are
stored as overflow items in the file.  In variable-length column-stores,
values too large to fit on a normal page are stored as overflow items
in the file.

WiredTiger allocates space from the underlying files in block units.
The minimum file allocation unit WiredTiger supports is 512B and the
maximum file allocation unit is 512MB.  File block offsets are 64-bit
(meaning the maximum file size is very, very large).

Variable-length column-store values, and row-store keys and values, can
be up to (4GB - 512B) in length.

Fixed-length values are limited to 8-bits (that is, only values between
0 and 255 may be stored in fixed-length column-store files).

@section remote Remote file systems

WiredTiger objects may be stored on remote file systems if the remote
file system conforms to ISO/IEC 9945-1:1996 (POSIX.1).  In the case of
read-only objects, multiple systems may access the objects
simultaneously; objects which are written cannot be accessed by more
than a single system.  Because remote file systems are often slower than
local file systems, using a remote file system for storage may degrade
performance.

@section file File permissions

WiredTiger creates file system objects readable and writable by the
process owner, group and user, as modified by the process' umask value.
The group ownership of created file system objects may vary depending
on the system, and is not controlled by WiredTiger.

*/