src/third_party/wiredtiger/src/docs/durability-overview.dox


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

/*! @page durability_overview Durability overview

<i>Durability</i> refers to the property that a transaction, once committed,
should be permanent and changes will never be lost.  This can mean a number of
things depending on context: for example, in-memory databases do not survive
system crashes by definition, but commits to them are still durable in the sense
that such commits are not rolled back or lost while the database is running.
Traditionally data updates are durable when they have been stored on stable
storage in a way that survives application and system crashes.

For WiredTiger, the situation is more complicated because WiredTiger is designed
to be able to participate in application-level distributed transaction schemes,
which implies the ability to roll back committed transactions when so instructed
by the application. In this manual we use the following terminology:

- A transaction is <i>committed</i> when
WT_SESSION::commit_transaction has been called and has returned successfully.
- A transaction is <i>durable</i> when all changes in it have been written to
stable storage such that failures will not cause it to be lost.
- A transaction is <i>stable</i> when it is durable, and furthermore cannot
be rolled back by application-level transaction management activity.

This page describes both the points at which transactions
proceed from one of these states to the next, and also what kinds of
failures transactions are protected from.

In general, there is a trade-off between write performance and
durability guarantees: weaker guarantees require fewer round
trips to storage devices.

@section explain_durability_in_memory In-memory databases

For in-memory databases, there is no disk-level durability and no protection
against system or application crashes.

@section explain_durability_checkpoint Checkpoint durability (without timestamps)

For objects using checkpoint durability, the object is checkpointed periodically
(a checkpoint is a complete self-consistent capture of the database state saved
to stable storage) and transactions become durable with the completion of the
next checkpoint after they commit.  The interval between checkpoints bounds the
possible data loss.  Locally durable and stable are the same.

@section explain_durability_commit Commit-level durability (without timestamps)

Changes to objects with commit-level durability have log records written and
flushed to disk before WT_SESSION::commit_transaction returns. Such transactions
are immediately durable and will survive both application and system failure.
For objects using commit-level durability, transactions become durable when they
are committed. Locally durable and stable are the same.

It is possible to skip the flush-to-disk; this makes transactions immediately
durable against application crashes, but not against system crashes.
Transactions then become durable against system crashes when the operating system
writes them out. (If the operating system were to never write the log records,
the object would revert to checkpoint durability.)

@section explain_durability_timestamp Adding timestamps

For databases using WiredTiger's timestamped data model, the durability model is
extended with the notion of a <i>stable timestamp</i>.  The stable timestamp is
an application-managed point in time that governs when durable data becomes
stable. That is, transactions commit at times after the stable timestamp, and
become durable according to the underlying durability model, but can still be
rolled back (and are thus considered <i>unstable</i>) until the stable timestamp
is advanced past their commit timestamp.

Applications can roll back all unstable transactions by calling
WT_CONNECTION::rollback_to_stable (or running recovery as part of database
open).

Objects in a timestamp system are generally configured for checkpoint-level
durability in which case the application's stable timestamp is saved with each
checkpoint and used for crash recovery. Thus, to be stable a transaction must
commit, the stable timestamp must be advanced past its durable timestamp, and
then a checkpoint must complete. (Data written as part of checkpoints that
complete between transaction commit and advancing the stable timestamp become
durable but not stable.)

Objects in a timestamp system that are configured for commit-level durability operate as
they do without timestamps, and timestamps are ignored for those objects.

*/