src/third_party/wiredtiger/src/docs/durability-checkpoint.dox


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

/*! @page durability_checkpoint Checkpoint-level durability

WiredTiger supports checkpoint durability by default, and optionally
commit-level durability when logging is enabled.  In most applications,
commit-level durability impacts performance more than checkpoint
durability; checkpoints offer basic operation durability across
application or system failure without impacting performance (although
the creation of each checkpoint is a relatively heavy-weight operation).
See @ref durability_log for information on commit-level durability.

@section checkpoint_snapshot Checkpoints vs. snapshots

Here is a brief explanation of the terms "checkpoint" and "snapshot", as
they are widely used in this manual. A checkpoint is an on-disk entity that
captures the persistent state of some or all of the database, while a snapshot
is a lightweight in-memory entity that captures the current state of pending
updates in the cache. Isolation refers to snapshots, because isolation is about
runtime state and which updates can be seen by other threads' transactions
as they run. Durability refers to checkpoints, because durability is about
on-disk persistence.  The two concepts are closely connected, of course;
when a checkpoint is created the code involved uses a snapshot to determine
which updates should and should not appear in the checkpoint.

@section checkpoint_checkpoints Checkpoints

A checkpoint is automatically created for each individual file whenever the last
reference to a modified data source is closed.

Checkpoints of the entire database can be explicitly created with the
WT_SESSION::checkpoint method. Automatic database-wide checkpoints can be
scheduled based on elapsed time or data size with the ::wiredtiger_open \c
checkpoint configuration. In this mode of operation, an internal server thread
is created to perform these checkpoints.

All transactional updates committed before a checkpoint are made durable
by the checkpoint, therefore the frequency of checkpoints limits the
volume of data that may be lost due to application or system failure.

Data sources that are involved in an exclusive operation when the checkpoint
starts, including bulk load, upgrade or salvage, will be skipped by the
checkpoint.

When a data source is first opened, it appears in the same state it was
in when it was most recently checkpointed. In other words, updates after
the most recent checkpoint (for example, in the case of failure), will not
appear in the data source at checkpoint-level durability.  If no checkpoint
is found when the data source is opened, the data source will appear empty.

@subsection checkpoint_target Checkpointing specific objects

The WT_SESSION::checkpoint method supports checkpoint of a set of target objects
(as opposed to a database-wide checkpoint), using the \c target configuration.

\warning
If objects are separately checkpointed from a database-wide checkpoint, it
is obviously possible to introduce data consistency problems if the objects
have related data. Additionally, because WiredTiger has a single database-wide
file used to store data history and other transactional information related
to the database objects, checkpointing a set of target objects without also
checkpointing that database-wide store can also lead to data inconsistencies.
In other words, it is difficult to safely use a targeted checkpoint in the
current WiredTiger release. Future development directions for the WiredTiger
library include splitting the database-wide history file into per-object files,
which will then be automatically checkpointed any time a database object is
checkpointed. For that reason, we anticipate checkpointing a set of objects
will become useful again in the future, but extreme caution is warranted
until those changes are made.

@section checkpoint_cursors Checkpoint cursors

Cursors are normally opened in the most recent version of a data source.
However, the \c checkpoint configuration string may be provided
to WT_SESSION::open_cursor, opening a read-only, static view of the
data source.  This provides a limited form of time-travel, as the static
view is not changed by subsequent checkpoints and will persist until
the checkpoint cursor is closed. While it is not an error to set a read
timestamp in a transaction including a checkpoint cursor, it also has no
effect on the data returned by the checkpoint cursor.

@section checkpoint_naming Checkpoint naming

Checkpoints that do not include LSM trees may optionally be given names by the
application.  Checkpoints named by the application persist until explicitly
discarded or replaced with a new checkpoint by the same name.  (If an
application attempts to replace an existing checkpoint, and it cannot be
removed, either because a cursor is reading from the previous checkpoint, or
because backups are in progress, the new checkpoint will fail and the previous
checkpoint will remain.) Because named checkpoints persist until discarded or
replaced, they can be used to save the state of the data for later use.

Internal checkpoints, that is, checkpoints not named by the application, use the
reserved name \c WiredTigerCheckpoint. (All checkpoint names beginning with this
string are reserved.) Applications can open the most recent of these checkpoints
by specifying \c WiredTigerCheckpoint as the checkpoint name to
WT_SESSION::open_cursor.

\warning
Applications wanting a consistent view of the data in two separate tables must
either use named checkpoints or explicitly control when checkpoints are taken,
as there is a race between opening the default checkpoints in two different
tables and a checkpoint operation replacing the default checkpoint in one of
those tables.

The \c -c option to the \c wt command line utility \c list command will list a
data source's checkpoints, with time stamp, in a human-readable format.

@section checkpoint_backup Checkpoint durability and backups

Backups are done using backup cursors (see @ref backup for more information).

\warning
When applications are using checkpoint-level durability, checkpoints taken
while a backup cursor is open are not durable. That is, if a crash occurs
when a backup cursor is open, then the system will be restored to the most
recent checkpoint prior to the opening of the backup cursor, even if later
database checkpoints were completed. As soon as the backup cursor is closed,
the system will again be restored to the most recent checkpoint taken.

Applications using commit-level durability retain durability via the write-ahead
log even though checkpoints taken while a backup cursor is open are not durable.
All log files are retained once the backup cursor is opened and, in the event of
a crash, all operations will be replayed to provide durability.

@section checkpoint_compaction Checkpoints and file compaction

Checkpoints share file blocks, and dropping a checkpoint may or may not
make file blocks available for re-use, depending on whether the dropped
checkpoint contained the last reference to those file blocks.  Because named
checkpoints are not discarded until explicitly discarded or replaced, they may
prevent WT_SESSION::compact from reducing file size due to shared file blocks.

*/