diff options
author | Marko Mäkelä <marko.makela@mariadb.com> | 2017-07-07 13:08:16 +0300 |
---|---|---|
committer | Marko Mäkelä <marko.makela@mariadb.com> | 2017-07-07 13:08:48 +0300 |
commit | 3c09f148f362a587ac3267c31fd17da5f71a0b11 (patch) | |
tree | e90c734802cd98006a09ea8db016f1116929240a /storage/innobase/row/row0vers.cc | |
parent | bae0844f6575767ee9ab730835a12779f13da685 (diff) | |
download | mariadb-git-3c09f148f362a587ac3267c31fd17da5f71a0b11.tar.gz |
MDEV-12288 Reset DB_TRX_ID when the history is removed, to speed up MVCC
Let InnoDB purge reset DB_TRX_ID,DB_ROLL_PTR when the history is removed.
[TODO: It appears that the resetting is not taking place as often as
it could be. We should test that a simple INSERT should eventually
cause row_purge_reset_trx_id() to be invoked unless DROP TABLE is
invoked soon enough.]
The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR
are used by multi-versioning. After the history is no longer needed, these
columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert).
When a reader sees 0 in the DB_TRX_ID column, it can instantly determine
that the record is present the read view. There is no need to acquire
the transaction system mutex to check if the transaction exists, because
writes can never be conducted by a transaction whose ID is 0.
The persistent InnoDB undo log used to be split into two parts:
insert_undo and update_undo. The insert_undo log was discarded at
transaction commit or rollback, and the update_undo log was processed
by the purge subsystem. As part of this change, we will only generate
a single undo log for new transactions, and the purge subsystem will
reset the DB_TRX_ID whenever a clustered index record is touched.
That is, all persistent undo log will be preserved at transaction commit
or rollback, to be removed by purge.
The InnoDB redo log format is changed in two ways:
We remove the redo log record type MLOG_UNDO_HDR_REUSE, and
we introduce the MLOG_ZIP_WRITE_TRX_ID record for updating the
DB_TRX_ID,DB_ROLL_PTR in a ROW_FORMAT=COMPRESSED table.
This is also changing the format of persistent InnoDB data files:
undo log and clustered index leaf page records. It will still be
possible via import and export to exchange data files with earlier
versions of MariaDB. The change to clustered index leaf page records
is simple: we allow DB_TRX_ID to be 0.
When it comes to the undo log, we must be able to upgrade from earlier
MariaDB versions after a clean shutdown (no redo log to apply).
While it would be nice to perform a slow shutdown (innodb_fast_shutdown=0)
before an upgrade, to empty the undo logs, we cannot assume that this
has been done. So, separate insert_undo log may exist for recovered
uncommitted transactions. These transactions may be automatically
rolled back, or they may be in XA PREPARE state, in which case InnoDB
will preserve the transaction until an explicit XA COMMIT or XA ROLLBACK.
Upgrade has been tested by starting up MariaDB 10.2 with
./mysql-test-run --manual-gdb innodb.read_only_recovery
and then starting up this patched server with
and without --innodb-read-only.
trx_undo_ptr_t::undo: Renamed from update_undo.
trx_undo_ptr_t::old_insert: Renamed from insert_undo.
trx_rseg_t::undo_list: Renamed from update_undo_list.
trx_rseg_t::undo_cached: Merged from update_undo_cached
and insert_undo_cached.
trx_rseg_t::old_insert_list: Renamed from insert_undo_list.
row_purge_reset_trx_id(): New function to reset the columns.
This will be called for all undo processing in purge
that does not remove the clustered index record.
trx_undo_update_rec_get_update(): Allow trx_id=0 when copying the
old DB_TRX_ID of the record to the undo log.
ReadView::changes_visible(): Allow id==0. (Return true for it.
This is what speeds up the MVCC.)
row_vers_impl_x_locked_low(), row_vers_build_for_semi_consistent_read():
Implement a fast path for DB_TRX_ID=0.
Always initialize the TRX_UNDO_PAGE_TYPE to 0. Remove undo->type.
MLOG_UNDO_HDR_REUSE: Remove. This changes the redo log format!
innobase_start_or_create_for_mysql(): Set srv_undo_sources before
starting any transactions.
The parsing of the MLOG_ZIP_WRITE_TRX_ID record was successfully
tested by running the following:
./mtr --parallel=auto --mysqld=--debug=d,ib_log innodb_zip.bug56680
grep MLOG_ZIP_WRITE_TRX_ID var/*/log/mysqld.1.err
Diffstat (limited to 'storage/innobase/row/row0vers.cc')
-rw-r--r-- | storage/innobase/row/row0vers.cc | 17 |
1 files changed, 17 insertions, 0 deletions
diff --git a/storage/innobase/row/row0vers.cc b/storage/innobase/row/row0vers.cc index de33c7c4d1b..26f6739c3ae 100644 --- a/storage/innobase/row/row0vers.cc +++ b/storage/innobase/row/row0vers.cc @@ -97,12 +97,25 @@ row_vers_impl_x_locked_low( ut_ad(rec_offs_validate(rec, index, offsets)); + if (ulint trx_id_offset = clust_index->trx_id_offset) { + trx_id = mach_read_from_6(clust_rec + trx_id_offset); + if (trx_id == 0) { + /* The transaction history was already purged. */ + DBUG_RETURN(0); + } + } + heap = mem_heap_create(1024); clust_offsets = rec_get_offsets( clust_rec, clust_index, NULL, ULINT_UNDEFINED, &heap); trx_id = row_get_rec_trx_id(clust_rec, clust_index, clust_offsets); + if (trx_id == 0) { + /* The transaction history was already purged. */ + mem_heap_free(heap); + DBUG_RETURN(0); + } corrupt = FALSE; trx_t* trx = trx_rw_is_active(trx_id, &corrupt, true); @@ -1262,6 +1275,10 @@ row_vers_build_for_semi_consistent_read( rec_trx_id = version_trx_id; } + if (!version_trx_id) { + goto committed_version_trx; + } + trx_sys_mutex_enter(); version_trx = trx_get_rw_trx_by_id(version_trx_id); /* Because version_trx is a read-write transaction, |