diff options
Diffstat (limited to 'src/third_party/wiredtiger/src/docs/arch-transaction.dox')
-rw-r--r-- | src/third_party/wiredtiger/src/docs/arch-transaction.dox | 194 |
1 files changed, 194 insertions, 0 deletions
diff --git a/src/third_party/wiredtiger/src/docs/arch-transaction.dox b/src/third_party/wiredtiger/src/docs/arch-transaction.dox index bc3c4e59722..d15a3cbb8d5 100644 --- a/src/third_party/wiredtiger/src/docs/arch-transaction.dox +++ b/src/third_party/wiredtiger/src/docs/arch-transaction.dox @@ -5,4 +5,198 @@ A caller of WiredTiger uses @ref transactions within the API to start and stop t a session (thread of control). Internally, the current transaction state is represented by the WT_TXN structure. + +Except schema operations, WiredTiger performs all the read and write operations within a +transaction. If the user doesn't explicitly begin a transaction, WiredTiger will automatically +create a transaction for the user's operation. + +@section Lifecycle + +A WiredTiger session creates and manages the transactions' lifecycle. One transaction can be +run at a time per session, and that transaction must complete before another transaction can be +started. Since every session is singly-threaded, all the operations in the transaction are executed +on the same thread. + +@plantuml_start{transaction_lifecycle.png} +@startuml{transaction_lifecycle.png} +:Transaction Lifecycle; + +split + :perform a read operation + (Create an auto transaction); +split again + :perform a write operation + (Create an auto transaction with a transaction id); +split again + :declare the beginning of a transaction; + :perform read operations; + :perform a write operation + (Assign a transaction id); + :perform read write operations; +split again + :declare the beginning of a transaction; + :perform read operations; + :perform a write operation + (Assign a transaction id); + :perform read write operations; + :prepare the transaction; +end split + +split + :rollback; +split again + :commit; +end split + +Stop +@enduml +@plantuml_end + +A transaction starts in two scenarios, when the user calls begin via +WT_SESSION::begin_transaction or internally when the user performs either a read or write +operation. Internally they are only started if they are not already within the context of a running +transaction. If declared explicitly the transaction will be active until it is committed or rolled +back. If it is created internally, it will cease to be active after the user operation either +successfully completes or fails. + +If the transaction is committed successfully, any write operation it performs is accepted by the +database and will be durable to some extent based on the durability setting. Otherwise, all the +write operations it has done will be reverted and will not be available any more. + +@section ACID Properties + +Like other databases, transactions in WiredTiger enforce the ACID properties (atomicity, +consistency, isolation, and durability). + +@subsection Atomicity + +All write operations initially happen in memory in WiredTiger and will not be written to disk until +the entire transaction is committed. Therefore, the size of the transaction must fit in memory. + +To rollback the transaction, WiredTiger only needs to mark all the write operations of that +transaction as aborted in memory. To ensure no partial transaction is persisted to disk, the +eviction threads and the checkpoint threads will do proper visibility checks to make sure each +persisted operations are actually visible in regards to their snapshot. + +There is one case that atomicity of transactions is not honored using timestamps in WiredTiger. If +the operations in the same transaction are conducted at different timestamps and the checkpoint +happens in between the timestamps, only the operations happen before or at the checkpoint timestamp +will be persisted in the checkpoint and the operations happen after the checkpoint timestamp in the +transaction will be discarded. + +There is another case that atomicity may be violated if a transaction operates both on tables with +logging enabled and disabled after restart. The operations on the tables with logging enabled will +survive the restart, while the operations on the non-logged tables may be lost if it is not +included in the latest checkpoint. + +@subsection Isolation + +Isolation is one of the important features of a database, which is used to determine whether one +transaction can read updates done by the other concurrent transactions. WiredTiger supports three +isolation levels, read uncommitted, read committed, and snapshot. However, only snapshot is +supported for write operations. By default, WiredTiger runs in snapshot isolation. + +1. Under snapshot isolation, a transaction is able to see updates done by other transactions +that are committed before it starts. + +2. Under read committed isolation, a transaction is able to see updates done by other +transactions that have been committed when the reading happens. + +3. Under read uncommitted isolation, a transaction is able to see updates done by all the +existing transactions, including the concurrent ones. + +Each transaction in WiredTiger is given a globally unique transaction id before doing the first +write operation and this id is written to each operation done by the same transaction. If the +transaction is running under snapshot isolation or read committed isolation, it will obtain a +transaction snapshot which includes a list of uncommitted concurrent transactions' ids at the +appropriate time to check the visibility of updates. For snapshot transaction, it is at the +beginning of the transaction and it will use the same snapshot across its whole life cycle. For +read committed transaction, it will obtain a new snapshot every time it does a search before +reading. Due to the overhead of obtaining snapshot, it uses the same snapshot for all the reads +before calling another search. Read uncommitted transactions don't have a snapshot. + +If the transaction has a snapshot, each read will check whether the update's transaction id is in +its snapshot. The updates with transaction ids in the snapshot or larger than the largest +transaction id in the snapshot are not visible to the reading transaction. + +When operating in read committed or read uncommitted isolation levels, it is possible to read +different values of the same key, seeing records not seen before, or finding records disappear in +the same transaction. This is called a phantom read. Under snapshot isolation, WiredTiger guarantees +repeated reads returning the same result except in one scenario using timestamps. + +@subsection Timestamps + +WiredTiger provides a mechanism to control when operations should be visible, called timestamps. +Timestamps are user specified sequence numbers that are associated with each operation. In +addition, users can assign an immutable read timestamp to a transaction at the beginning. A +transaction can only see updates with timestamps smaller or equal to its read timestamp. Note that +read timestamp 0 means no read timestamp and the transaction can see the updates regardless of +timestamps. Also note that timestamps don't have to be derived from physical times. Users can use +any 64 bit unsigned integer as logical timestamps. For a single operation, the timestamps +associated with the operations in the same transaction don't have to be the same as long as they +are monotonically increasing. + +Apart from the operation level timestamps, the users are also responsible for managing the global +level timestamps, i.e, the oldest timestamp, and the stable timestamp. The oldest timestamp is the +timestamp that should be visible by all concurrent transactions. The stable timestamp is the +minimum timestamp that a new operation can commit at. + +Only transactions running in snapshot isolation can run with timestamps. + +@subsection Visibility + +The visibility of the transactions in WiredTiger considers both the operations' transaction ids and +timestamps. The operation is visible only when both its transaction id and its timestamp are +visible to the reading transaction. + +To read a key, WiredTiger first traverses all the updates of that key still in memory until a +visible update is found. The in-memory updates in WiredTiger are organized as a singly linked list +with the newest update at the head, called the update chain. If no value is visible on the update +chain, it checks the version on the disk image, which is the version that was chosen to be written +to disk in the last reconciliation. If it is still invisible, WiredTiger will search the history +store to check if there is a version visible to the reader there. + +The repeated read guarantee under snapshot isolation may break in one case if the timestamps +committed to the updates are out of order, e.g, + +`U@20 -> U@30 -> U@15` + +In the above example, reading with timestamp 15 doesn't guarantee to return the third update. In +some cases, users may read the second update U@30 if it is moved to the history store. + +@subsection Durability + +WiredTiger transactions support commit level durability and checkpoint level durability. An +operation is commit level durable if logging is enabled on the table (@ref arch-logging). After it +has been successfully committed, the operation is guaranteed to survive restart. An operation will +only survive across restart under checkpoint durability if it is included in the last successful +checkpoint. + +@section Prepared Transactions + +WiredTiger introduces prepared transactions to meet the needs of implementing distributed +transactions through two-phase commit. Prepared transactions only work under snapshot isolation. + +Instead of just having the beginning, operating, and rollback or commit phase, it has a prepared +phase before the rollback or commit phase. After prepare is called, WiredTiger releases the +transaction's snapshot and prohibits any more read or write operations on the transaction. + +By introducing the prepared stage, a two-phase distributed transaction algorithm can rely on the +prepared state to reach consensus among all the nodes for committing. + +Along with the prepared phase, WiredTiger introduces the prepared timestamp and durable timestamp. +They are to prevent the slow prepared transactions blocking the movement of the global stable +timestamp, which may cause excessive amounts of data to be pinned in memory. The stable timestamp +is allowed to move beyond the prepared timestamp and at the commit time, the prepared transaction +can then be committed after the current stable timestamp with a larger durable timestamp. The +durable timestamp also marks the time the update is to be stable. If the stable timestamp is moved +to or beyond the durable timestamp of an update, it will not be removed by rollback to stable from +a checkpoint. See @ref arch-rts for more details. + +The visibility of the prepared transaction is also special when in the prepared state. Since in the +prepared state, the transaction has released its snapshot, it should be visible to the transactions +starting after that based on the normal visibility rule. However, the prepared transaction has not +been committed and cannot be visible yet. In this situation, WiredTiger will return a +WT_PREPARE_CONFLICT to indicate to the caller to retry later, or if configured WiredTiger will +ignore the prepared update and read older updates. */ |