delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	MDEV-21910 : KIlling thread on Galera could cause mutex deadlock10.2-MDEV-21910	Jan Lindström	2020-03-13	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following issues here: Whenever Galera BF (brute force) transaction decides to abort conflicting transaction it will kill that thread using thd::awake() User KILL [QUERY\|CONNECTION] ... for a thread it will also call thd::awake() Whenever one of these actions is executed we will hold number of InnoDB internal mutexes and thd mutexes. Sometimes these mutexes are taken in different order causing mutex deadlock. Lets consider a example starting from Galera BF transaction deciding to abort conflicting transaction (let's call this thread1): thread1: lock_rec_other_has_conflicting here we hold both lock_sys mutex and trx mutex for conflicting lock transaction. wsrep_innobase_kill_one_trx For victim we call wsrep_thd_awake Next thread2 we assume user to execute KILL QUERY to some other executing query (note it can't be BF query). thread2: sql_kill() kill_one_thread find_thread_by_id takes LOCK_thd_kill thd->awake_no_mutex() thread1: thd->awake(KILL_QUERY) Tries to have LOCK_thd_kill but it is hold by thread2 so we wait (note that we still hold lock_sys mutex and trx mutex). thread2: ha_kill_query() kill_handlerton innobase_kill_query lock_trx_handle_wait lock_mutex_enter() must wait as thread1 is holding it Thus thread1 lock_sys, trx_mutex waits -> thread2 LOCK_thd_kill waits lock_sys -> thread1 ==> thread1 waits -> thread2 waits -> thread1 ==> mutex deadlock. In this patch we will fix Galera BF and user kill cases so that we enqueue victim thread to a list while we hold InnoDB mutexes and we then release them. A new background thread will pick victim thread from this new list and uses thd::awake() with no InnoDB mutexes. Idea is similar to replication background kill. This fix enforces that we take LOCK_thd_data -> lock sys mutex -> trx mutex always in this order. wsrep_mysqld.cc Here we introduce a list where victim threads are stored, condition variable to be used to wake up background thread and mutex to protect list. wsrep_thd.cc Create a new background thread to handle victim thread abort. We may take wsrep_thd_LOCK mutex here but not any InnoDB mutexes. wsrep_innobase_kill_one_trx Remove all the wsrep code that was moved to wsrep_thd.cc We just enqueue required information to background kill list and cancel victim trx lock wait if there is such. Here we have InnoDB lock sys mutex and trx mutex so here we can't take wsrep_thd_LOCK mutex. wsrep_abort_transaction Cleanup only.
*	MDEV-18562 [ERROR] InnoDB: WSREP: referenced FK check fail: Lock wait index	Jan Lindström	2019-10-30	1	-2/+2
\| \| \| \| \| \|	Lock wait can happen on secondary index when doing FK checks for wsrep. We should just return error to upper layer and applier will retry operation when needed.
*	Merge 10.1 into 10.2	Marko Mäkelä	2019-05-13	1	-1/+1
\|\
\| *	Merge branch '5.5' into 10.1	Vicențiu Ciorbaru	2019-05-11	1	-1/+1
\| \|
\| *	Revert MDEV-18464 and MDEV-12009	Marko Mäkelä	2019-03-28	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 21b2fada7ab7f35c898c02d2f918461409cc9c8e and commit 81d71ee6b21870772c336bff15b71904914f146a. The MDEV-18464 change introduces a few data race issues. Contrary to the documentation, the field trx_t::victim is not always being protected by lock_sys_t::mutex and trx_t::mutex. Most importantly, it seems that KILL QUERY could wrongly avoid acquiring both mutexes when invoking lock_trx_handle_wait_low(), in case another thread had already set trx->victim=true. We also revert MDEV-12009, because it should depend on the MDEV-18464 fix being present.
\| *	MDEV-12009: Allow to force kill user threads/query which are flagged as high ↵	Jan Lindström	2019-03-28	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	priority by Galera As noted on kill_one_thread SUPER should be able to kill even system threads i.e. threads/query flagged as high priority or wsrep applier thread. Normal user, should not able to kill threads/query flagged as high priority (BF) or wsrep applier thread.
\| *	MDEV-9519: Data corruption will happen on the Galera cluster size change	Julius Goryavsky	2019-02-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a 2+ node cluster which is replicating from an async master and the binlog_format is set to STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the server node generates wrong auto_increment values, which are different from what was generated on the async master. In the title of the MDEV-9519 it was proposed to ban start slave on a Galera if master binlog_format = statement and wsrep_auto_increment_control = 1, but the problem can be solved without such a restriction. The causes and fixes: 1. We need to improve processing of changing the auto-increment values after changing the cluster size. 2. If wsrep auto_increment_control switched on during operation of the node, then we should immediately update the auto_increment_increment and auto_increment_offset global variables, without waiting of the next invocation of the wsrep_view_handler_cb() callback. In the current version these variables retain its initial values if wsrep_auto_increment_control is switched on during operation of the node, which leads to inconsistent results on the different nodes in some scenarios. 3. If wsrep auto_increment_control switched off during operation of the node, then we must return the original values of the auto_increment_increment and auto_increment_offset global variables, as the user has set. To make this possible, we need to add a "shadow copies" of these variables (which stores the latest values set by the user). https://jira.mariadb.org/browse/MDEV-9519
* \|	MDEV-17262: mysql crashed on galera while node rejoined cluster (#895)	sysprg	2019-03-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains a fix for the MDEV-17262/17243 issues and new mtr test. These issues (MDEV-17262/17243) have two reasons: 1) After an intermediate commit, a transaction loses its status of "transaction that registered in the MySQL for 2pc coordinator" (in the InnoDB) due to the fact that since version 10.2 the write_row() function (which located in the ha_innodb.cc) does not call trx_register_for_2pc(m_prebuilt->trx) during the processing of split transactions. It is necessary to restore this call inside the write_row() when an intermediate commit was made (for a split transaction). Similarly, we need to set the flag of the started transaction (m_prebuilt->sql_stat_start) after intermediate commit. The table->file->extra(HA_EXTRA_FAKE_START_STMT) called from the wsrep_load_data_split() function (which located in sql_load.cc) will also do this, but it will be too late. As a result, the call to the wsrep_append_keys() function from the InnoDB engine may be lost or function may be called with invalid transaction identifier. 2) If a transaction with the LOAD DATA statement is divided into logical mini-transactions (of the 10K rows) and binlog is rotated, then in rare cases due to the wsrep handler re-registration at the boundary of the split, the last portion of data may be lost. Since splitting of the LOAD DATA into mini-transactions is technical, I believe that we should not allow these mini-transactions to fall into separate binlogs. Therefore, it is necessary to prohibit the rotation of binlog in the middle of processing LOAD DATA statement. https://jira.mariadb.org/browse/MDEV-17262 and https://jira.mariadb.org/browse/MDEV-17243
* \|	MDEV-18577: Indexes problem on import dump SQL	Jan Lindström	2019-03-13	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem was that we skipped background persistent statistics calculation on applier nodes if thread is marked as high priority (a.k.a BF). However, on applier nodes all DDL which is replicate will be executed as high priority i.e BF. Fixed by allowing background persistent statistics calculation on applier nodes even when thread is marked as BF. This could lead BF lock waits but for queries on that node needs that statistics.
* \|	MDEV-9519: Data corruption will happen on the Galera cluster size change	Julius Goryavsky	2019-02-26	1	-0/+8
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a 2+ node cluster which is replicating from an async master and the binlog_format is set to STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the server node generates wrong auto_increment values, which are different from what was generated on the async master. In the title of the MDEV-9519 it was proposed to ban start slave on a Galera if master binlog_format = statement and wsrep_auto_increment_control = 1, but the problem can be solved without such a restriction. The causes and fixes: 1. We need to improve processing of changing the auto-increment values after changing the cluster size. 2. If wsrep auto_increment_control switched on during operation of the node, then we should immediately update the auto_increment_increment and auto_increment_offset global variables, without waiting of the next invocation of the wsrep_view_handler_cb() callback. In the current version these variables retain its initial values if wsrep_auto_increment_control is switched on during operation of the node, which leads to inconsistent results on the different nodes in some scenarios. 3. If wsrep auto_increment_control switched off during operation of the node, then we must return the original values of the auto_increment_increment and auto_increment_offset global variables, as the user has set. To make this possible, we need to add a "shadow copies" of these variables (which stores the latest values set by the user). https://jira.mariadb.org/browse/MDEV-9519
*	fix failures of innodb_plugin tests in --embedded	Sergei Golubchik	2018-09-04	1	-0/+3
\| \| \| \| \| \|	Post-fix for 7e8ed15b95b Also, apply the same innodb fix to xtradb.
*	Fix to ensure updates in gtid_slave_state table do not get binlogged.	Nirbhay Choubey	2016-02-24	1	-1/+1
\| \| \| \| \|	Also, renamed wsrep_skip_append_keys to wsrep_ignore_table. Test case : galera.galera_as_slave_gtid.test
*	Merge branch 'github/10.0-galera' into 10.1	Sergei Golubchik	2015-12-22	1	-0/+3
\| \| \| \|	Note: some tests fail, just as they failed before the merge!
*	Merge branch '10.0-galera' into 10.1	Nirbhay Choubey	2015-07-14	1	-0/+3
\|
*	remove wsrep_hton dependency from innodb/xtradb	Sergei Golubchik	2015-01-08	1	-1/+1
\|
*	fixing embedded: WaaS. Wsrep as a Service.	Sergei Golubchik	2014-10-01	1	-0/+126