MDEV-6376: InnoDB: Assertion failure in thread 139995225970432

in file buf0mtflu.cc line 570. Analysis: Real timing bug, we should take the mutex before we try to send those shutdown messages, that would make sure that threads doing a unfinished flush (they have acquired this mutex) have time to do their work before we add shutdown messages to work queue. Currently, we just add those shutdown messages to work queue and code assumes that at flush, there is constant number of items to be processed and thus leading to assertion.
author: Jan Lindström <jan.lindstrom@skysql.com> 2014-06-28 13:53:18 +0300
committer: Jan Lindström <jan.lindstrom@skysql.com> 2014-06-28 13:53:18 +0300
commit: b35c5912b651496ad5797bf85eaef3a431235e68 (patch)
tree: 4da06df7031f59ee1a91977ec65699b9053317d8
parent: 36e86bac72ca42ba6537211f39dd0556d5dc1084 (diff)
download: mariadb-git-b35c5912b651496ad5797bf85eaef3a431235e68.tar.gz
2 files changed, 31 insertions, 0 deletions
diff --git a/storage/innobase/buf/buf0mtflu.cc b/storage/innobase/buf/buf0mtflu.cc
index 5a1769e3b70..ded24edc799 100644
--- a/storage/innobase/buf/buf0mtflu.cc
+++ b/storage/innobase/buf/buf0mtflu.cc
@@ -378,6 +378,20 @@ buf_mtflu_io_thread_exit(void)
 	fprintf(stderr, "InnoDB: [Note]: Signal mtflush_io_threads to exit [%lu]\n",
 		srv_mtflush_threads);
 
+	/* This lock is to safequard against timing bug: flush request take
+	this mutex before sending work items to be processed by flush
+	threads. Inside flush thread we assume that work queue contains only
+	a constant number of items. Thus, we may not install new work items
+	below before all previous ones are processed. This mutex is released
+	by flush request after all work items sent to flush threads have
+	been processed. Thus, we can get this mutex if and only if work
+	queue is empty. */
+
+	os_fast_mutex_lock(&mtflush_mtx);
+
+	/* Make sure the work queue is empty */
+	ut_a(ib_wqueue_is_empty(mtflush_io->wq));
+
 	/* Send one exit work item/thread */
 	for (i=0; i < srv_mtflush_threads; i++) {
 		work_item[i].tsk = MT_WRK_NONE;
@@ -399,6 +413,9 @@ buf_mtflu_io_thread_exit(void)
 
 	ut_a(ib_wqueue_is_empty(mtflush_io->wq));
 
+	/* Requests sent */
+	os_fast_mutex_unlock(&mtflush_mtx);
+
 	/* Collect all work done items */
 	for (i=0; i < srv_mtflush_threads;) {
 		wrk_t* work_item = NULL;
diff --git a/storage/xtradb/buf/buf0mtflu.cc b/storage/xtradb/buf/buf0mtflu.cc
index b14b83aa5d0..945bd93a1d3 100644
--- a/storage/xtradb/buf/buf0mtflu.cc
+++ b/storage/xtradb/buf/buf0mtflu.cc
@@ -385,6 +385,17 @@ buf_mtflu_io_thread_exit(void)
 	fprintf(stderr, "InnoDB: [Note]: Signal mtflush_io_threads to exit [%lu]\n",
 		srv_mtflush_threads);
 
+	/* This lock is to safequard against timing bug: flush request take
+	this mutex before sending work items to be processed by flush
+	threads. Inside flush thread we assume that work queue contains only
+	a constant number of items. Thus, we may not install new work items
+	below before all previous ones are processed. This mutex is released
+	by flush request after all work items sent to flush threads have
+	been processed. Thus, we can get this mutex if and only if work
+	queue is empty. */
+
+	os_fast_mutex_lock(&mtflush_mtx);
+
 	/* Send one exit work item/thread */
 	for (i=0; i < srv_mtflush_threads; i++) {
 		work_item[i].tsk = MT_WRK_NONE;
@@ -406,6 +417,9 @@ buf_mtflu_io_thread_exit(void)
 
 	ut_a(ib_wqueue_is_empty(mtflush_io->wq));
 
+	/* Requests sent */
+	os_fast_mutex_unlock(&mtflush_mtx);
+
 	/* Collect all work done items */
 	for (i=0; i < srv_mtflush_threads;) {
 		wrk_t* work_item = NULL;
author	Jan Lindström <jan.lindstrom@skysql.com>	2014-06-28 13:53:18 +0300
committer	Jan Lindström <jan.lindstrom@skysql.com>	2014-06-28 13:53:18 +0300
commit	b35c5912b651496ad5797bf85eaef3a431235e68 (patch)
tree	4da06df7031f59ee1a91977ec65699b9053317d8
parent	36e86bac72ca42ba6537211f39dd0556d5dc1084 (diff)
download	mariadb-git-b35c5912b651496ad5797bf85eaef3a431235e68.tar.gz