summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/arch-index.dox
blob: 1cc1f9a2c52531d30a219db91caa24c514777134 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
/*! @page arch-index WiredTiger Architecture Guide

The WiredTiger Architecture Guide provides a comprehensive overview
of WiredTiger internals and code that should be useful for any database
engineer wanting to understand how the storage engine works. The goal
of this guide is to help both MongoDB engineers and external users
quickly understand the workings of WiredTiger.

\warning
The Architecture Guide is not updated in lockstep with the code base and is not
necessarily correct or complete for any specific release.

The relationships between the software components in WiredTiger are
illustrated in the diagram below. An arrow originating at Component A
and pointing to Component B indicates that Component B is used by
Component A.

<div class="arch_diagram">
@plantuml_start{wt_diagram.png }
@startuml{wt_diagram.png}

' We add spacing to the diagram in places to influence the layout.
' To do this, we create some invisible components with hidden arrows
' pointing to them.  Since we don't otherwise use the "file" component,
' we set all its parts to be transparent, and any use of "file" results
' in an invisible spacer whose width is directed by the length of its label.
' When modifying this diagram, it's sometimes useful to comment out the
' following lines, and any [hidden] directives used below, to see how
' the spacers influence the layout.  Note that this may be fragile;
' the spacers give hints to the layout, such hints will not always be honored.

skinparam fileBorderColor Transparent
skinparam fileBackgroundColor Transparent
skinparam fileFontColor Transparent
skinparam fileShadowing false

' Our diagram is simple.  First, we define lots of labeled rectangles
' with most nesting within the "engine" rectangle.

together {
  rectangle "[[arch-python.html Python API]]" as python_api
  ' "storage" displays as an oval.
  storage "       C/C++  \n   applications   " as application
  rectangle "[[command_line.html wt Utility]]" as utility
}

' Trailing spaces for this label puts the text to the left.
rectangle "**WiredTiger Engine**                                                                 " as wt_engine {
  ' Leading and trailing spaces make a wide rectangle.
  together {
    ' Putting two invisible file boxes on either side centers the middle box.
    file "____" as SPACE_api
    rectangle "                                        [[modules.html C API]]                                        " as c_api
    file "____" as SPACE_api2
    ' Influence the ordering of the invisible boxes using (hidden) arrows.
    SPACE_api -[hidden]right-> c_api
    c_api -[hidden]right-> SPACE_api2
  }
  rectangle "[[arch-schema.html Schema]]" as schema
  rectangle "[[arch-cursor.html Cursor]]" as cursor
  rectangle "[[arch-transaction.html Transactions]]" as txn
  rectangle "[[arch-metadata.html Metadata]]" as meta
  rectangle "[[arch-dhandle.html dhandle/\nBtree]]" as btree
  rectangle "[[arch-row-column.html Row/Column\nStorage]]" as row
  rectangle "[[arch-hs.html History\nStore]]" as history
  rectangle "[[arch-snapshot.html Snapshots]]" as snapshot
  rectangle "[[arch-cache.html Cache]]" as cache
  rectangle "[[arch-eviction.html Eviction]]" as evict

  together {
    rectangle "[[arch-block.html Block\nManager]]" as block
    file "__________" as SPACE_log
    rectangle "[[arch-logging.html Logging]]" as log
    file "___" as SPACE_log2
  }
  rectangle "                              [[arch-fs-os.html File System & OS interface]]                              " as os
}
together {
  database "[[arch-data-file.html Database\nFiles]]" as wt_file
  database "  [[arch-log-file.html Log\nFiles]]" as log_file
}

' Influence the ordering at the top using (hidden) arrows.
python_api -[hidden]right-> application
application -[hidden]right-> utility

python_api -down-> c_api
application -down-> c_api
utility -down-> c_api

c_api -down-> schema
c_api -down-> cursor
c_api -down-> txn

schema -down-> meta
schema -down-> btree
cursor -down-> btree
btree -down-> row
meta -up-> cursor
' The hidden arrow helps our boxes to line up in a better way.
meta -[hidden]right-> btree
cursor -[hidden]right-> txn
txn -down-> snapshot
row -down-> cache
cache -down-> history
evict -down-> history
history -up-> cursor
snapshot -down-> evict
cache -right-> evict
cache -down-> block
evict -down-> block
txn -down-> log

block -[hidden]right-> SPACE_log
cache -[hidden]down-> SPACE_log
evict -[hidden]down-> SPACE_log
SPACE_log -[hidden]right-> log
log -[hidden]right-> SPACE_log2

block -down-> os
log -down-> os
os -down-> wt_file
os -down-> log_file

wt_file -[hidden]right-> log_file

@enduml
@plantuml_end
</div>

For those unfamiliar with storage engine architecture and/or seeking an introduction
to WiredTiger, we recommend reading the guide in the order presented. You can find
much of the architecture-specific terminology explained in the @ref arch-glossary.
For an application level view of WiredTiger, head over to the @ref basic_api section
of the documentation.

<div class="arch_toc">
<h1>Table of Contents</h1>

- @subpage arch-toc-fundamentals
    - @ref arch-connection
    - @ref arch-session
    - @ref arch-cursor
    - @ref arch-transaction
    - @ref arch-timestamp
    - @ref arch-snapshot
    - @ref arch-rts
    - @ref arch-fast-truncate

- @subpage arch-toc-data-org
    - @ref arch-schema
    - @ref arch-metadata

- @subpage arch-toc-data-src
    - @ref arch-dhandle
    - @ref arch-btree
    - @ref arch-row-column

- @subpage arch-toc-in-mem
    - @ref arch-cache
    - @ref arch-eviction

- @subpage arch-toc-mem-disk
    - @ref arch-block
    - @ref arch-data-file
    - @ref arch-s3-extension

- @subpage arch-toc-on-disk
    - @ref arch-checkpoint
    - @ref arch-hs
    - @ref arch-backup
    - @ref arch-compact

- @subpage arch-toc-recovery
    - @ref arch-logging
    - @ref arch-log-file

- @subpage arch-toc-tools
    - @ref arch-python
    - @ref command_line

- @subpage arch-toc-platform
    - @ref arch-fs-os

- @subpage arch-glossary

</div>

*/

/*! @page arch-toc-fundamentals Fundamentals

@subpage arch-connection
- A connection is a handle to a WiredTiger database instance.

@subpage arch-session
- A session defines the context for most operations performed in WiredTiger.

@subpage arch-cursor
- Cursors are used to get and modify data.

@subpage arch-transaction
- Transactions provide a powerful abstraction for multiple threads to
operate on data concurrently.

@subpage arch-timestamp
- The timestamp data model.

@subpage arch-snapshot
- Snapshots are implemented by storing transaction ids committed before
the transaction started.

@subpage arch-rts
- Rollback the database to a stable state by removing data that is beyond the
stable timestamp.

@subpage arch-fast-truncate
- Delete whole pages at once without reading them, and handling of
such pages.

*/

/*! @page arch-toc-data-org Data Organization

@subpage arch-schema

A schema defines the format of the application data in WiredTiger.

@subpage arch-metadata

Metadata is stored as <code>uri, config</code> key/value pairs in a designated table.

*/

/*! @page arch-toc-data-src Data Sources

@subpage arch-btree

A B-Tree is one type of underlying data source in a dhandle and is organized into pages.

@subpage arch-dhandle

An internal structure called a Data Handle (dhandle) is used to represent and
access Btrees and other data sources in WiredTiger.

@subpage arch-row-column

Row stores and column store are B-Trees. Row stores have a variable size key
and data while column stores have as their key a record id.

*/

/*! @page arch-toc-in-mem In Memory Concepts

@subpage arch-cache

Cache is represented by the various shared data structures that
make up in-memory Btrees and subordinate data structures.

@subpage arch-eviction

Eviction represents the process or removing old data from the cache,
writing it to disk if it is dirty.

*/

/*! @page arch-toc-mem-disk Moving Data Between Memory and Disk

@subpage arch-block

The block manager manages the reading and writing of disk blocks.

@subpage arch-data-file

The format of the data file is given by structures in \c block.h .

@subpage arch-s3-extension

The S3 storage source extension manages flushing the data files to and reading from the S3 cloud object store.

*/

/*! @page arch-toc-on-disk On Disk Concepts

@subpage arch-checkpoint

A checkpoint is created by WiredTiger to serve as a point from which it can recover.

@subpage arch-hs

The History Store tracks old versions of records.

@subpage arch-backup

Hot backup uses a type of cursor to backup the database.

@subpage arch-compact

Compaction process can be used to reclaim unused space from on-disk files.

*/

/*! @page arch-toc-recovery Recovery

@subpage arch-logging

WiredTiger writes all changes into a write-ahead log when configured.

@subpage arch-log-file

The format of a log file is defined in \c log.h .

*/

/*! @page arch-toc-tools Tools

@subpage arch-python

WiredTiger has a Python API that is useful for scripting and experimentation.

@subpage command_line

The \c wt tool is a command-line utility that provides access to various pieces
of the WiredTiger functionality.

*/

/*! @page arch-toc-platform Cross-Platform Support

@subpage arch-fs-os

A layer of abstraction is above all operating system calls and
a set of functions can be registered to be called for each file system
operation.

*/