summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMarko Mäkelä <marko.makela@mariadb.com>2020-12-28 12:06:22 +0200
committerMarko Mäkelä <marko.makela@mariadb.com>2020-12-28 12:06:22 +0200
commit5b9ee8d8193a8c7a8ebdd35eedcadc3ae78e7fc1 (patch)
tree136c29d054b5634e03deb48b9dbdf17f919f8b4c
parent8e3e87d2fc1e63d287f203d441dcb9360775c6b7 (diff)
downloadmariadb-git-5b9ee8d8193a8c7a8ebdd35eedcadc3ae78e7fc1.tar.gz
MDEV-24449 Corruption of system tablespace or last recovered page
This corresponds to 10.5 commit 39378e1366f78b38c05e45103b9fb9c829cc5f4f. With a patched version of the test innodb.ibuf_not_empty (so that it would trigger crash recovery after using the change buffer), and patched code that would modify the os_thread_sleep() in recv_apply_hashed_log_recs() to be 1ms as well as add a sleep of the same duration to the end of recv_recover_page() when recv_sys->n_addrs=0, we can demonstrate a race condition. After disabling some debug checks in buf_all_freed_instance(), buf_pool_invalidate_instance() and buf_validate(), we managed to trigger an assertion failure in fseg_free_step(), on the XDES_FREE_BIT. In other words, an trx_undo_seg_free() call during trx_rollback_resurrected() was attempting a double-free of a page. This was repeated about once in 400 to 500 test runs. With the fix applied, the test passed 2,000 runs. recv_apply_hashed_log_recs(): Do not only wait for recv_sys->n_addrs to reach 0, but also wait for buf_get_n_pending_read_ios() to reach 0, to guarantee that buf_page_io_complete() will not be executing ibuf_merge_or_delete_for_page().
-rw-r--r--storage/innobase/log/log0recv.cc2
1 files changed, 1 insertions, 1 deletions
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 4c3886caeaf..95179ec2271 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -2501,7 +2501,7 @@ apply:
/* Wait until all the pages have been processed */
- while (recv_sys->n_addrs != 0) {
+ while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) {
const bool abort = recv_sys->found_corrupt_log
|| recv_sys->found_corrupt_fs;