summaryrefslogtreecommitdiff
path: root/storage/innobase/row/row0vers.cc
diff options
context:
space:
mode:
authorMarko Mäkelä <marko.makela@mariadb.com>2017-07-07 13:08:16 +0300
committerMarko Mäkelä <marko.makela@mariadb.com>2017-07-07 13:08:48 +0300
commit3c09f148f362a587ac3267c31fd17da5f71a0b11 (patch)
treee90c734802cd98006a09ea8db016f1116929240a /storage/innobase/row/row0vers.cc
parentbae0844f6575767ee9ab730835a12779f13da685 (diff)
downloadmariadb-git-3c09f148f362a587ac3267c31fd17da5f71a0b11.tar.gz
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC
Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed. [TODO: It appears that the resetting is not taking place as often as it could be. We should test that a simple INSERT should eventually cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is invoked soon enough.] The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log used to be split into two parts: insert_undo and update_undo. The insert_undo log was discarded at transaction commit or rollback, and the update_undo log was processed by the purge subsystem. As part of this change, we will only generate a single undo log for new transactions, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is touched. That is, all persistent undo log will be preserved at transaction commit or rollback, to be removed by purge. The InnoDB redo log format is changed in two ways: We remove the redo log record type MLOG_UNDO_HDR_REUSE, and we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table. This is also changing the format of persistent InnoDB data files: undo log and clustered index leaf page records. It will still be possible via import and export to exchange data files with earlier versions of MariaDB. The change to clustered index leaf page records is simple: we allow DB_TRX_ID to be 0. When it comes to the undo log, we must be able to upgrade from earlier MariaDB versions after a clean shutdown (no redo log to apply). While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0) before an upgrade, to empty the undo logs, we cannot assume that this has been done. So, separate insert_undo log may exist for recovered uncommitted transactions. These transactions may be automatically rolled back, or they may be in XA PREPARE state, in which case InnoDB will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK. Upgrade has been tested by starting up MariaDB 10.2 with ./mysql-test-run --manual-gdb innodb.read_only_recovery and then starting up this patched server with and without --innodb-read-only. trx_undo_ptr_t::undo: Renamed from update_undo. trx_undo_ptr_t::old_insert: Renamed from insert_undo. trx_rseg_t::undo_list: Renamed from update_undo_list. trx_rseg_t::undo_cached: Merged from update_undo_cached and insert_undo_cached. trx_rseg_t::old_insert_list: Renamed from insert_undo_list. row_purge_reset_trx_id(): New function to reset the columns. This will be called for all undo processing in purge that does not remove the clustered index record. trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the old DB_TRX_ID of the record to the undo log. ReadView::changes_visible(): Allow id==0. (Return true for it. This is what speeds up the MVCC.) row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read(): Implement a fast path for DB_TRX_ID=0. Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type. MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format! innobase_start_or_create_for_mysql(): Set srv_undo_sources before starting any transactions. The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully tested by running the following: ./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680 grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
Diffstat (limited to 'storage/innobase/row/row0vers.cc')
-rw-r--r--storage/innobase/row/row0vers.cc17
1 files changed, 17 insertions, 0 deletions
diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc
index de33c7c4d1b..26f6739c3ae 100644
--- a/storage/innobase/row/row0vers.cc
+++ b/storage/innobase/row/row0vers.cc
@@ -97,12 +97,25 @@ row_vers_impl_x_locked_low(
ut_ad(rec_offs_validate(rec, index, offsets));
+ if (ulint trx_id_offset = clust_index->trx_id_offset) {
+ trx_id = mach_read_from_6(clust_rec + trx_id_offset);
+ if (trx_id == 0) {
+ /* The transaction history was already purged. */
+ DBUG_RETURN(0);
+ }
+ }
+
heap = mem_heap_create(1024);
clust_offsets = rec_get_offsets(
clust_rec, clust_index, NULL, ULINT_UNDEFINED, &heap);
trx_id = row_get_rec_trx_id(clust_rec, clust_index, clust_offsets);
+ if (trx_id == 0) {
+ /* The transaction history was already purged. */
+ mem_heap_free(heap);
+ DBUG_RETURN(0);
+ }
corrupt = FALSE;
trx_t* trx = trx_rw_is_active(trx_id, &corrupt, true);
@@ -1262,6 +1275,10 @@ row_vers_build_for_semi_consistent_read(
rec_trx_id = version_trx_id;
}
+ if (!version_trx_id) {
+ goto committed_version_trx;
+ }
+
trx_sys_mutex_enter();
version_trx = trx_get_rw_trx_by_id(version_trx_id);
/* Because version_trx is a read-write transaction,