summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/transactions.dox
blob: 25845ce020973b214c1280dceb9aa2fd2fa1c991 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
/*! @m_page{{c,java},transactions,Transactions}

@section transactions_acid ACID properties

Transactions provide a powerful abstraction for multiple threads to
operate on data concurrently because they have the following properties:

- Atomicity: all or none of a transaction is completed.
- Consistency: if each transaction maintains some property when considered
  separately, then the combined effect of executing the transactions
  concurrently will maintain the same property.
- Isolation: developers can reason about transactions as if they run
  single-threaded.
- Durability: once a transaction commits, its updates cannot be lost.

WiredTiger supports transactions with the following caveats to the ACID
properties:

- the maximum level of isolation supported is snapshot isolation.
  See @ref transaction_isolation for more details.
- transactional updates are made durable by a combination of checkpoints
  and logging.  See @ref checkpoint for information on checkpoint durability
  and @ref durability for information on commit-level durability.
- each transaction's uncommitted changes must fit in memory: for
  efficiency, WiredTiger does not write to the log until a transaction
  commits.

@section transactions_api Transactional API

In WiredTiger, transaction operations are methods off the WT_SESSION
class.

Applications call WT_SESSION::begin_transaction to start a new transaction.
Operations subsequently performed using that WT_SESSION handle, including
operations on any cursors open in that WT_SESSION handle (whether opened
before or after the WT_SESSION::begin_transaction call), are part of the
transaction and their effects committed by calling
WT_SESSION::commit_transaction, or discarded by calling
WT_SESSION::rollback_transaction. Applications that use
@ref transaction_timestamps can utilize the WT_SESSION::prepare_transaction API
as a basis for implementing a two phase commit protocol.

If WT_SESSION::commit_transaction returns an error for any reason, the
transaction was rolled back, not committed.

When transactions are used, data operations can encounter a conflict and
fail with the ::WT_ROLLBACK error.  If this error occurs, transactions
should be rolled back with WT_SESSION::rollback_transaction and the
operation retried.

The WT_SESSION::rollback_transaction method implicitly resets all
cursors in the session as if the WT_CURSOR::reset method was called,
discarding any cursor position as well as any key and value.

@snippet ex_all.c transaction commit/rollback

@section transactions_implicit Implicit transactions

If a cursor is used when no explicit transaction is active in a session,
reads are performed at the isolation level of the session, set with the
\c isolation key to WT_CONNECTION::open_session, and successful updates
are automatically committed before the update operation returns.

Any operation consisting of multiple related updates should be enclosed
in an explicit transaction to ensure the updates are applied atomically.

If an implicit transaction successfully commits, the cursors in the
WT_SESSION remain positioned.  If an implicit transaction fails, all
cursors in the WT_SESSION are reset, as if WT_CURSOR::reset were called,
discarding any position or key/value information they may have.

See @ref cursors_transactions for more information.

@section transactions_concurrency Concurrency control

WiredTiger uses optimistic concurrency control algorithms.  This avoids
the bottleneck of a centralized lock manager and ensures transactional
operations do not block: reads do not block writes, and vice versa.

Further, writes do not block writes, although concurrent transactions
updating the same value will fail with ::WT_ROLLBACK.  Some applications
may benefit from application-level synchronization to avoid repeated
attempts to rollback and update the same value.

Operations in transactions may also fail with the ::WT_ROLLBACK error if
some resource cannot be allocated after repeated attempts.  For example, if
the cache is not large enough to hold the updates required to satisfy
transactional readers, an operation may fail and return ::WT_ROLLBACK.

@section transaction_isolation Isolation levels

WiredTiger supports <code>read-uncommitted</code>,
<code>read-committed</code> and  <code>snapshot</code> isolation levels;
the default isolation level is <code>read-committed</code>.

- <code>read-uncommitted</code>:
Transactions can see changes made by other transactions before those
transactions are committed.  Dirty reads, non-repeatable reads and
phantoms are possible.

- <code>read-committed</code>:
Transactions cannot see changes made by other transactions before those
transactions are committed.  Dirty reads are not possible;
non-repeatable reads and phantoms are possible.  Committed changes from
concurrent transactions become visible when no cursor is positioned in
the read-committed transaction.

- <code>snapshot</code>:
Transactions read the versions of records committed before the transaction
started.  Dirty reads and non-repeatable reads are not possible; phantoms
are possible.<br><br>
Snapshot isolation is a strong guarantee, but not equivalent to a
single-threaded execution of the transactions, known as serializable
isolation.  Concurrent transactions T1 and T2 running under snapshot
isolation may both commit and produce a state that neither (T1 followed
by T2) nor (T2 followed by T1) could have produced, if there is overlap
between T1's reads and T2's writes, and between T1's writes and T2's
reads.

The transaction isolation level can be configured on a per-transaction
basis:

@snippet ex_all.c transaction isolation

Additionally, the default transaction isolation can be configured and
re-configured on a per-session basis:

@snippet ex_all.c session isolation configuration

@snippet ex_all.c session isolation re-configuration

@section transaction_named_snapshots Named Snapshots

Applications can create named snapshots by calling WT_SESSION::snapshot
with a configuration that includes <code>"name=foo"</code>.
This configuration creates a new named snapshot, as if a snapshot isolation
transaction were started at the time of the WT_SESSION::snapshot call.

Subsequent transactions can be started "as of" that snapshot by calling
WT_SESSION::begin_transaction with a configuration that includes
<code>snapshot=foo</code>.  That transaction will run at snapshot isolation
as if the transaction started at the time of the WT_SESSION::snapshot
call that created the snapshot.

Named snapshots keep data pinned in cache as if a real transaction were
running for the time that the named snapshot is active.  The resources
associated with named snapshots should be released by calling
WT_SESSION::snapshot with a configuration that includes
<code>"drop="</code>. See WT_SESSION::snapshot documentation for details of
the semantics supported by the drop configuration.

Named snapshots are not durable: they do not survive WT_CONNECTION::close.

@section transaction_timestamps Application-specified Transaction Timestamps

@subsection timestamp_overview Timestamp overview

Some applications have their own notion of time, including an expected commit
order for transactions that may be inconsistent with the order assigned by
WiredTiger. We assume applications can represent their notion of a timestamp
as an unsigned 64-bit integral value that generally increases over time. For
example, a counter could be incremented to generate transaction timestamps,
if that is sufficient for the application.

Applications can assign explicit commit timestamps to transactions, then read
"as of" a timestamp. The timestamp mechanism operates in parallel with
WiredTiger's internal transaction ID management. It is recommended that once
timestamps are in use for a particular table, all subsequent updates also use
timestamps.

@subsection timestamp_transactions Using transactions with timestamps

Applications that use timestamps will generally provide a timestamp at
WT_SESSION::transaction_commit that will be assigned to all updates that are
part of the transaction. WiredTiger also provides the ability to set a different
commit timestamp for different updates in a single transaction. This can
be done by calling WT_SESSION::timestamp_transaction repeatedly to set a new
commit timestamp between a set of updates for the current transaction. This
gives the ability to commit updates with different read "as of" timestamps in a
single transaction.

Setting a read timestamp in WT_SESSION::begin_transaction forces a transaction
to run at snapshot isolation and ignore any commits with a newer timestamp.

Commit timestamps cannot be set in the past of any read timestamp that has
been used.  This is enforced by assertions in diagnostic builds, if
applications violate this rule, data consistency can be violated.

The commits to a particular data item must be performed in timestamp order.
If applications violate this rule, data consistency can be violated.

The WT_SESSION::prepare_transaction API is designed to be used in conjunction
with timestamps and assigns a prepare timestamp to the transaction, which will
be used for visibility checks until the transaction is committed or aborted.
Once a transaction has been prepared the only other operations that can be
completed are WT_SESSION::commit_transaction or
WT_SESSION::rollback_transaction. The WT_SESSION::prepare_transaction API only
guarantees that transactional conflicts will not cause the transaction to
rollback - it does not guarantee that the transactions updates are durable. If
a read operation encounters an update from a prepared transaction a
WT_PREPARE_CONFLICT error will be returned indicating that it is not possible
to choose a version of data to return until a prepared transaction is resolved,
it is reasonable to retry such operations.

Durability of the data updates performed by a prepared transaction, on tables
configured with log=(enabled=false), can be controlled by specifying a durable
timestamp during WT_SESSION::commit_transaction. Checkpoint will consider the
durable timestamp, instead of commit timestamp for persisting the data updates.
If the durable timestamp is not specified, then the commit timestamp will be
considered as the durable timestamp.

@subsection timestamp_connection Managing global timestamp state

Applications that use timestamps need to manage some global state in order
to allow WiredTiger to clean up old updates, and not make new updates durable
until it is safe to do so. That state is managed using the
WT_CONNECTION::set_timestamp API.

Setting an oldest timestamp in WT_CONNECTION::set_timestamp indicates that
future read timestamps will be at least as recent as the oldest timestamp, so
WiredTiger can discard history before the specified point.  It is critical
that the oldest timestamp update frequently or the cache can become full of
updates, reducing performance.

Setting a stable timestamp in WT_CONNECTION::set_timestamp indicates a
known stable location that is sufficient for durability.  During a checkpoint
the state of a table will be saved only as of the stable timestamp.  Newer
updates after that stable timestamp will not be included in the checkpoint.
That can be overridden in the call to WT_SESSION::checkpoint.  It is expected
that the stable timestamp is updated frequently. Setting a stable location
provides the ability, if needed, to rollback to this location by placing a call
to WT_CONNECTION::rollback_to_stable. With the rollback, however, WiredTiger
does not automatically reset the maximum commit timestamp it is tracking. The
application should explicitly do so by setting a commit timestamp in
WT_CONNECTION::set_timestamp.

@subsection timestamp_extensions Timestamp support in the extension API

The extension API, used by modules that extend WiredTiger via
WT_CONNECTION::get_extension_api, is not timestamp-aware.  In particular,
WT_EXTENSION_API::transaction_oldest and
WT_EXTENSION_API::transaction_visible do not take timestamps into account.
Extensions relying on these functions may not work correctly with timestamps.
 */