MDEV-21910 : KIlling thread on Galera could cause mutex deadlockbb-10.3-MDEV-21910

Whenever Galera BF (brute force) transaction decides to abort conflicting transaction it will kill that thread using thd::awake() User KILL [QUERY|CONNECTION] ... for a thread it will also call thd::awake() Whenever one of these actions is executed we will hold number of InnoDB internal mutexes and thd mutexes. Sometimes these mutexes are taken in different order causing mutex deadlock. Lets call BF kill as bf_thread and user KILL-query as kill_thread. bf_thread takes mutexes in order: (1) lock_sys->mutex (lock0lock.cc lock_rec_other_has_conflicting) (2) victim_trx->mutex (lock0lock.cc lock_rec_other_has_conflicting) (3) victim_thread->LOCK_thd_data (handler.cc wsrep_innobase_kill_one_trx) kill_thread takes mutexes in order: (1) victim_thread->LOCK_thd_data (sql_parse.cc find_thread_by_id) (2) lock_sys->mutex (ha_innodb.cc innobase_kill_query) (3) victim_trx->mutex (ha_innodb.cc innobase_kill_query) Mutex deadlock is result of taking victim_thread->LOCK_thd_data in different order. In this patch we will fix Galera BF victim thread kill so that it will not try to lock victim_thread->LOCK_thd_data mutex while we hold InnoDB mutexes. Instead victim is inserted a list for later kill processing. A new background thread will pick victim thread from this new list and uses thd::awake() with no InnoDB mutexes. Idea is similar to replication background kill. This fix enforces that we take mutexes in same order: (1) victim_thread->LOCK_thd_data (2) lock_sys->mutex -> (3) victim_trx->mutex wsrep_mysqld.cc Here we introduce a list where victim threads are stored, condition variable to be used to wake up background thread and mutex to protect list. wsrep_thd.cc Create a new background thread to handle victim thread abort. We may take victim_thread->LOCK_thd_data mutex here but not any InnoDB mutexes. wsrep_innobase_kill_one_trx Remove all the wsrep code that was moved to wsrep_thd.cc We just enqueue required information to background kill list and cancel victim trx lock wait if there is such. Here we have InnoDB lock_sys->mutex and victim_trx->mutex so here we can't take victim_thread->LOCK_thd_data mutex. wsrep_abort_transaction Cleanup only.
author: Jan Lindström <jan.lindstrom@mariadb.com> 2020-03-12 15:34:50 +0200
committer: Jan Lindström <jan.lindstrom@mariadb.com> 2020-09-02 20:13:52 +0300
commit: a8d75cd0885707be1791f9dd61723cc5ac0013a6 (patch)
tree: 076f4e872d0ae7796d3b408a35e517a7e3642ca2 /sql/wsrep_var.cc
parent: caa35f8e25ce22d6b4f4c377970354cf582c7f41 (diff)
download: mariadb-git-bb-10.3-MDEV-21910.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/sql/wsrep_var.cc b/sql/wsrep_var.cc
index f18dc565329..be3a55557e7 100644
--- a/sql/wsrep_var.cc
+++ b/sql/wsrep_var.cc
@@ -500,6 +500,7 @@ bool wsrep_cluster_address_update (sys_var *self, THD* thd, enum_var_type type)
   if (wsrep_start_replication())
   {
     wsrep_create_rollbacker();
+    wsrep_create_killer();
     WSREP_DEBUG("Cluster address update creating %ld applier threads running %lu",
 	    wsrep_slave_threads, wsrep_running_applier_threads);
     wsrep_create_appliers(wsrep_slave_threads);
author	Jan Lindström <jan.lindstrom@mariadb.com>	2020-03-12 15:34:50 +0200
committer	Jan Lindström <jan.lindstrom@mariadb.com>	2020-09-02 20:13:52 +0300
commit	a8d75cd0885707be1791f9dd61723cc5ac0013a6 (patch)
tree	076f4e872d0ae7796d3b408a35e517a7e3642ca2 /sql/wsrep_var.cc
parent	caa35f8e25ce22d6b4f4c377970354cf582c7f41 (diff)
download	mariadb-git-bb-10.3-MDEV-21910.tar.gz