delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	MDEV-20220: Merge 5.7 P_S replication table ↵	Sujatha	2021-04-08	1	-1/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	'replication_applier_status_by_worker Step 3: ====== Preserve worker pool information on either STOP SLAVE/Error. In case STOP SLAVE is executed worker threads will be gone, hence worker threads will be unavailable. Querying the table at this stage will give empty rows. To address this case when worker threads are about to stop, due to an error or forced stop, create a backup pool and preserve the data which is relevant to populate performance schema table. Clear the backup pool upon slave start.
*	MDEV-20220: Merge 5.7 P_S replication table ↵	Sujatha	2021-04-08	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'replication_applier_status_by_worker Step2: ===== Add two extra columns mentioned below. --------------------------------------------------------------------------- \|Column Name: \| Description: \| \|-------------------------------------------------------------------------\| \| \| \| \|WORKER_IDLE_TIME \| Total idle time in seconds that the worker \| \| \| thread has spent waiting for work from \| \| \| co-ordinator thread \| \| \| \| \|LAST_TRANS_RETRY_COUNT \| Total number of retries attempted by last \| \| \| transaction \| ---------------------------------------------------------------------------
*	MDEV-20220: Merge 5.7 P_S replication table ↵	Sujatha	2021-04-08	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'replication_applier_status_by_worker Step1: ===== Backport 'replication_applier_status_by_worker' from upstream. Iterate through rpl_parallel_thread_pool and display slave worker thread specific information as part of 'replication_applier_status_by_worker' table. --------------------------------------------------------------------------- \|Column Name: \| Description: \| \|-------------------------------------------------------------------------\| \| \| \| \|CHANNEL_NAME \| Name of replication channel through which the \| \| \| transaction is received. \| \| \| \| \|THREAD_ID \| Thread_Id as displayed in 'performance_schema. \| \| \| threads' table for thread with name \| \| \| 'thread/sql/rpl_parallel_thread' \| \| \| \| \| \| THREAD_ID will be NULL when worker threads are \| \| \| stopped due to an error/force stop \| \| \| \| \|SERVICE_STATE \| Thread is running or not \| \| \| \| \|LAST_SEEN_TRANSACTION \| Last GTID executed by worker \| \| \| \| \|LAST_ERROR_NUMBER \| Last Error that occured on a particular worker \| \| \| \| \|LAST_ERROR_MESSAGE \| Last error specific message \| \| \| \| \|LAST_ERROR_TIMESTAMP \| Time stamp of last error \| \| \| \| --------------------------------------------------------------------------- CHANNEL_NAME will be empty when the worker has not processed any transaction. Channel_name points to valid source channel_name when it is processing a transaction/event group.
*	Merge branch 'bb-10.4-release' into bb-10.5-release	Sergei Golubchik	2021-02-15	1	-2/+3
\|\
\| *	Merge branch 'bb-10.3-release' into bb-10.4-release	Sergei Golubchik	2021-02-12	1	-2/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately
\| \| *	Merge branch '10.2' into 10.3	Sergei Golubchik	2021-02-01	1	-2/+3
\| \| \|\
\| \| \| *	MDEV-8134: The relay-log is not flushed after the slave-relay-log.999999 showed	Sujatha	2021-01-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem: ======== Auto purge of relaylogs stops when relay-log-file is 'slave-relay-log.999999' and slave_parallel_threads is enabled. Analysis: ========= The problem is that in Relay_log_info::inc_group_relay_log_pos() function, when two log names are compared via strcmp() function, it gives correct result, when log name sequence numbers are of same digits(6 digits), But when the number goes to 7 digits, a 999999 compares greater than 1000000, which is wrong, hence the bug. Fix: ==== Extract the numeric extension part of the file name, convert it into unsigned long and compare. Thanks to David Zhao for the contribution.
\| \| * \|	Merge branch '10.2' into 10.3mariadb-10.3.24	Sergei Golubchik	2020-08-06	1	-4/+23
\| \| \|\ \ \| \| \| \|/
\| \| \| *	Merge branch '10.1' into 10.2mariadb-10.2.33	Sergei Golubchik	2020-08-06	1	-4/+23
\| \| \| \|\
\| \| \| \| *	MDEV-23089 rpl_parallel2 fails in 10.5	Sachin	2020-08-04	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem:- rpl_parallel2 was failing non-deterministically Analysis:- When FLUSH TABLES WITH READ LOCK is executed, it will allow all worker threads to complete their ongoing transactions and then it will pause them. At this state FTWRL will proceed to acquire global read lock. FTWRL first blocks threads from starting new commits, then upgrades the lock to block commit of existing transactions. Step1: FLUSH TABLES WITH READ LOCK - Blocks new commits Step2: * STOP SLAVE command enables 'force_abort=1' which unblocks workers, they continue to execute events. * T1: Waits in 'record_gtid' call to update 'gtid_slave_pos' table with its current GTID, but it is blocked becuase of Step1. * T2: Holds COMMIT lock and waits for T1 to commit. Step3: FLUSH TABLES WITH READ LOCK - Waiting to get BLOCK_COMMIT. This results in deadlock. When STOP SLAVE command allows paused workers to proceed, workers should skip the execution of all further events, similar to 'conservative' parallel mode. Solution:- We will assign 1 to skip_event_group when we are aborted in do_ftwrl_wait. rpl_parallel_entry->pause_sub_id is only reset when force_abort is off in rpl_pause_after_ftwrl.
\| \| \| \| *	MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL	Andrei Elkin	2020-05-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.
* \| \| \| \|	Merge branch '10.4' into 10.5	Oleksandr Byelkin	2020-08-04	1	-4/+23
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-23089 rpl_parallel2 fails in 10.5	Sachin	2020-08-03	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Problem:- rpl_parallel2 was failing non-deterministically Analysis:- When FLUSH TABLES WITH READ LOCK is executed, it will allow all worker threads to complete their ongoing transactions and then it will pause them. At this state FTWRL will proceed to acquire global read lock. FTWRL first blocks threads from starting new commits, then upgrades the lock to block commit of existing transactions. Step1: FLUSH TABLES WITH READ LOCK - Blocks new commits Step2: * STOP SLAVE command enables 'force_abort=1' which unblocks workers, they continue to execute events. * T1: Waits in 'record_gtid' call to update 'gtid_slave_pos' table with its current GTID, but it is blocked becuase of Step1. * T2: Holds COMMIT lock and waits for T1 to commit. Step3: FLUSH TABLES WITH READ LOCK - Waiting to get BLOCK_COMMIT. This results in deadlock. When STOP SLAVE command allows paused workers to proceed, workers should skip the execution of all further events, similar to 'conservative' parallel mode. Solution:- We will assign 1 to skip_event_group when we are aborted in do_ftwrl_wait. rpl_parallel_entry->pause_sub_id is only reset when force_abort is off in rpl_pause_after_ftwrl.
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-07-21	1	-1/+1
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel repl.	Monty	2020-07-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issue was: T1, a parallel slave worker thread, is waiting for another worker thread to commit. While waiting, it has the MDL_BACKUP_COMMIT lock. T2, working for mariabackup, is doing BACKUP STAGE BLOCK_COMMIT and blocks all commits. This causes a deadlock as the thread T1 is waiting for can't commit. Fixed by moving locking of MDL_BACKUP_COMMIT from ha_commit_trans() to commit_one_phase_2() Other things: - Added a new argument to ha_comit_one_phase() to signal if the transaction was a write transaction. - Ensured that ha_maria::implicit_commit() is always called under MDL_BACKUP_COMMIT. This code is not needed in 10.5 - Ensure that MDL_Request values 'type' and 'ticket' are always initialized. This makes it easier to check the state of the MDL_Request. - Moved thd->store_globals() earlier in handle_rpl_parallel_thread() as thd->init_for_queries() could use a MDL that could crash if store_globals where not called. - Don't call ha_enable_transactions() in THD::init_for_queries() as this is both slow (uses MDL locks) and not needed.
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-06-18	1	-0/+15
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at ↵	Sachin	2020-06-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	/data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-05-31	1	-3/+3
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-05-30	1	-3/+3
\| \|\ \ \ \ \| \| \|/ / /
\| \| * \| \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-05-27	1	-3/+3
\| \| \|\ \ \ \| \| \| \|/ /
\| \| \| * \|	MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTIL	Andrei Elkin	2020-05-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-04-25	1	-1/+1
\|\ \ \ \ \ \| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The functional changes of commit 5836191c8f0658d5d75484766fdcc3d838b0a5c1 (MDEV-21168) are omitted due to MDEV-742 having addressed the issue.
\| * \| \| \|	Relay_log_info::executed_entries to Atomic_counter	Sergey Vojtovich	2020-04-15	1	-1/+1
\| \| \| \| \|
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-03-27	1	-1/+1
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	dequeued_count my_atomic to Atomic_counter	Sergey Vojtovich	2020-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also allocate inuse_relaylog with new rather than my_malloc(MY_ZEROFILL).
* \| \| \| \|	Fix various spelling errors	Otto Kekäläinen	2020-03-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	e.g. - dont -> don't - occurence -> occurrence - succesfully -> successfully - easyly -> easily Also remove trailing space in selected files. These changes span: - server core - Connect and Innobase storage engine code - OQgraph, Sphinx and TokuDB storage engines Related to MDEV-21769.
* \| \| \| \|	MDEV-742 XA PREPAREd transaction survive disconnect/server restart	Andrei Elkin	2020-03-14	1	-13/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lifted long standing limitation to the XA of rolling it back at the transaction's connection close even if the XA is prepared. Prepared XA-transaction is made to sustain connection close or server restart. The patch consists of - binary logging extension to write prepared XA part of transaction signified with its XID in a new XA_prepare_log_event. The concusion part - with Commit or Rollback decision - is logged separately as Query_log_event. That is in the binlog the XA consists of two separate group of events. That makes the whole XA possibly interweaving in binlog with other XA:s or regular transaction but with no harm to replication and data consistency. Gtid_log_event receives two more flags to identify which of the two XA phases of the transaction it represents. With either flag set also XID info is added to the event. When binlog is ON on the server XID::formatID is constrained to 4 bytes. - engines are made aware of the server policy to keep up user prepared XA:s so they (Innodb, rocksdb) don't roll them back anymore at their disconnect methods. - slave applier is refined to cope with two phase logged XA:s including parallel modes of execution. This patch does not address crash-safe logging of the new events which is being addressed by MDEV-21469. CORNER CASES: read-only, pure myisam, binlog-, @@skip_log_bin, etc Are addressed along the following policies. 1. The read-only at reconnect marks XID to fail for future completion with ER_XA_RBROLLBACK. 2. binlog- filtered XA when it changes engine data is regarded as loggable even when nothing got cached for binlog. An empty XA-prepare group is recorded. Consequent Commit-or-Rollback succeeds in the Engine(s) as well as recorded into binlog. 3. The same applies to the non-transactional engine XA. 4. @@skip_log_bin=OFF does not record anything at XA-prepare (obviously), but the completion event is recorded into binlog to admit inconsistency with slave. The following actions are taken by the patch. At XA-prepare: when empty binlog cache - don't do anything to binlog if RO, otherwise write empty XA_prepare (assert(binlog-filter case)). At Disconnect: when Prepared && RO (=> no binlogging was done) set Xid_cache_element::error := ER_XA_RBROLLBACK keep XID in the cache, and rollback the transaction. At XA-"complete": Discover the error, if any don't binlog the "complete", return the error to the user. Kudos ----- Alexey Botchkov took to drive this work initially. Sergei Golubchik, Sergei Petrunja, Marko Mäkelä provided a number of good recommendations. Sergei Voitovich made a magnificent review and improvements to the code. They all deserve a bunch of thanks for making this work done!
* \| \| \| \|	cleanup: PSI key is always the first argument	Sergei Golubchik	2020-03-10	1	-2/+2
\| \| \| \| \|
* \| \| \| \|	perfschema memory related instrumentation changes	Sergei Golubchik	2020-03-10	1	-5/+5
\|/ / / /
* \| \| \|	MDEV-6860 Parallel async replication hangs (#1400)	seppo	2019-10-16	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instrumenting parallel slave worker thread with wsrep replication hooks. Added mtr test for testing parallel slave support. The test is based on the test attached in MDEV-6860 jira tracker.
* \| \| \|	Merge remote-tracking branch 'origin/10.3' into 10.4	Alexander Barkov	2019-10-01	1	-0/+16
\|\ \ \ \ \| \|/ / /
\| * \| \|	Merge remote-tracking branch 'origin/10.2' into 10.3	Alexander Barkov	2019-10-01	1	-0/+16
\| \|\ \ \ \| \| \|/ /
\| \| * \|	Merge remote-tracking branch 'origin/10.1' into 10.2	Alexander Barkov	2019-10-01	1	-0/+16
\| \| \|\ \ \| \| \| \|/
\| \| \| *	MDEV-20645: Replication consistency is broken as workers miss the error ↵	Sujatha	2019-09-30	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	notification from an earlier failed group. Analysis: ======== In general if there are three groups. 1 - Inserts 32 which fails due to local entry '32' on slave. 2 - Inserts 33 3 - Inserts 34 Each group considers itself as a waiter and it waits for prior group 'waitee'. This is done in 'register_wait_for_prior_event_group_commit'. If there is no other parallel group being scheduled then no waitee will be there. Let us assume 3 groups are being scheduled in parallel. 3-> waits for 2-> waits for->1 '1' upon completion it checks is there any registered subsequent waiter. If so it wakes up the subsequent waiter with its execution status. This execution status is stored in wakeup_error. If '1' failed then it sends corresponding wakeup_error to 2. Then '2' aborts and it propagates error to '3'. So all further commits are aborted. This mechanism works only when all transactions reach a stage where they are waiting for their prior commit to complete. In case of optimistic following scenario occurs. 1,2,3 are scheduled in parallel. 3 - Reaches group_commit_code waits for 2 to complete. 1 - errors out sets stop_on_error_sub_id=1. When a group execution results in error its corresponding sub_id is set to 'stop_on_error_sub_id'. Any new groups queued for execution will check if their sub_id is > stop_on_error_sub_id. If it is true their execution will be skipped as prior group execution failed. 'skip_event_group=1' will be set. Since the execution of SQL thread is about to stop we just skip execution of all the following event groups. We still do all the normal waiting and wakeup processing between the event groups as a simple way to ensure that everything is stopped and cleaned up correctly. Upon error '1' transaction checks for registered waiters. Since no one is there it simply goes away. 2 - Starts the execution. It checks do I have a waitee. Since wait_commit_sub_id == entry->last_committed_sub_id no waitee is set. Secondly: 'entry->stop_on_error_sub_id' is set by '1'st execution. Now 'handle_parallel_thread' code checks if the current group 'sub_id' is greater than the 'sub_id' set within 'stop_on_error_sub_id'. Since the above is true 'skip_event_group=true' is set. Simply call 'wait_for_prior_commit' to wakeup all waiters. Group '2' didn't had any waitee and its execution is skipped. Hence its wakeup_error=0.It sends a positive wakeup signal to '3'. Which commits. This results in a missed transaction. i.e 33 is missed and 34 is committed. Fix: === When a worker learns that an earlier transaction execution has failed, and it should not proceed for further execution, it should mark its own execution status as failed so that it alerts its followers to abort as well.
* \| \| \|	Move THD list handling to THD_list	Sergey Vojtovich	2019-01-28	1	-2/+2
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implemented and integrated THD_list as a replacement for the global thread list. It uses own mutex instead of LOCK_thread_count for THD list protection. Removed unused first_global_thread() and next_global_thread(). delayed_insert_threads is now protected by LOCK_delayed_insert. Although this patch doesn't fix very wrong synchronization of this variable. After this patch there are only 2 legitimate uses of LOCK_thread_count left, both in mysqld.cc: thread_count and ready_to_exit. Aim is to reduce usage of LOCK_thread_count and COND_thread_count. Part of MDEV-15135.
* \| \|	Merge 10.2 into 10.3	Marko Mäkelä	2018-10-11	1	-11/+21
\|\ \ \ \| \|/ /
\| * \|	Merge 10.1 into 10.2	Marko Mäkelä	2018-10-11	1	-11/+21
\| \|\ \ \| \| \|/
\| \| *	MDEV-17346 parallel slave start and stop races to workers disappeared	Andrei Elkin	2018-10-08	1	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug appears as a slave SQL thread hanging in rpl_parallel_thread_pool::get_thread() while there are no slave worker threads to awake it. The reason of the hang is that at the parallel slave worker pool activation the being stared SQL thread could read the worker pool size concurrently with pool deactivation. At reading the SQL thread did not employ necessary protection from a race. Fixed with making the SQL thread at the pool activation first to grab the same lock as potential deactivator also does prior to access the pool size.
* \| \|	MDEV-16286 Killed CREATE SEQUENCE leaves sequence in unusable state	Monty	2018-05-27	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed by deleting the sequence if we where not able to initialize it I also noticed that we didn't always set the error message when check_killed(), which could lead to aborted queries without error beeing properly set. Fixed by default setting error message if check_error() noticed that killed had been called. This allowed me to remove a lot of calls to thd->send_kill_message().
* \| \|	MDEV-13134 Introduce ALTER TABLE attributes ALGORITHM=NOCOPY and ↵	Thirunarayanan Balathandayuthapani	2018-05-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ALGORITHM=INSTANT Introduced new alter algorithm type called NOCOPY & INSTANT for inplace alter operation. NOCOPY - Algorithm refuses any alter operation that would rebuild the clustered index. It is a subset of INPLACE algorithm. INSTANT - Algorithm allow any alter operation that would modify only meta data. It is a subset of NOCOPY algorithm. Introduce new variable called alter_algorithm. The values are DEFAULT(0), COPY(1), INPLACE(2), NOCOPY(3), INSTANT(4) Message to deprecate old_alter_table variable and make it alias for alter_algorithm variable. alter_algorithm variable for slave is always set to default.
* \| \|	Add likely/unlikely to speed up execution	Monty	2018-05-07	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added to: - if (error) - Lex - sql_yacc.yy and sql_yacc_ora.yy - In header files to alloc() calls - Added thd argument to thd_net_is_killed()
* \| \|	Merge branch '10.2' into 10.3	Sergei Golubchik	2018-03-28	1	-1/+36
\|\ \ \ \| \|/ /
\| * \|	MDEV-12746 rpl.rpl_parallel_optimistic_nobinlog fails committing	Andrei Elkin	2018-03-13	1	-2/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	out of order at retry The test failures were of two sorts. One is that the number of retries what the slave thought as a temporary error exceeded the default value of the slave retry option. The 2nd issue was an out of order commit by transactions that were supposed to error out instead. Both issues are caused by the same reason that the post-temporary-error retry did not check possibly already existing error status. This is mended with refining conditions to retry. Specifically, a retrying worker checks `rpl_parallel_entry::stop_on_error_sub_id` that a potential failing predecessor could set to its own sub id. Now should the member be set the retrying follower errors out with ER_PRIOR_COMMIT_FAILED.
* \| \|	Changed database, tablename and alias to be LEX_CSTRING	Monty	2018-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was done in, among other things: - thd->db and thd->db_length - TABLE_LIST tablename, db, alias and schema_name - Audit plugin database name - lex->db - All db and table names in Alter_table_ctx - st_select_lex db Other things: - Changed a lot of functions to take const LEX_CSTRING* as argument for db, table_name and alias. See init_one_table() as an example. - Changed some function arguments from LEX_CSTRING to const LEX_CSTRING - Changed some lists from LEX_STRING to LEX_CSTRING - threads_mysql.result changed because process list_db wasn't always correctly updated - New append_identifier() function that takes LEX_CSTRING* as arguments - Added new element tmp_buff to Alter_table_ctx to separate temp name handling from temporary space - Ensure we store the length after my_casedn_str() of table/db names - Removed not used version of rename_table_in_stat_tables() - Changed Natural_join_column::table_name and db_name() to never return NULL (used for print) - thd->get_db() now returns db as a printable string (thd->db.str or "")
* \| \|	Merge remote-tracking branch 'origin/bb-10.2-ext' into 10.3	Alexander Barkov	2018-01-29	1	-1/+10
\|\ \ \ \| \|/ /
\| * \|	Fix for MDEV-12730	Monty	2018-01-24	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assertion `count > 0' failed in rpl_parallel_thread_pool:: get_thread, rpl.rpl_parallel failed in buildbot The reason for this is that one thread can call rpl_parallel_resize_pool_if_no_slaves() while another thread calls at the same time rpl_parallel_activate_pool(). If rpl_parallel_active_pool() is called before rpl_parallel_resize_pool_if_no_slaves() has finished, pool->count will be set to 0 even if there exists active slave threads. Added a mutex lock in rpl_parallel_activate_pool() to protect against this scenario, which seams to fix this issue.
* \| \|	Changed from using LOCK_log to LOCK_binlog_end_pos for binary log	Monty	2017-12-18	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Part of MDEV-13073 AliSQL Optimize performance of semisync The idea it to use a dedicated lock detecting if there is new data in the master's binary log instead of the overused LOCK_log. Changes: - Use dedicated COND variables for the relay and binary log signaling. This was needed as we where the old 'update_cond' variable was used with different mutex's, which could cause deadlocks. - Relay log uses now COND_relay_log_updated and LOCK_log - Binary log uses now COND_bin_log_updated and LOCK_binlog_end_pos - Renamed signal_cnt to relay_signal_cnt (as we now have two signals) - Added some missing error handling in MYSQL_BIN_LOG::new_file_impl() - Reformatted some comments with old style - Renamed m_key_LOCK_binlog_end_pos to key_LOCK_binlog_end_pos - Changed 'signal_update()' to update_binlog_end_pos() which works for both relay and binary log
* \| \|	Removed not used lock argument from read_log_event	Monty	2017-12-18	1	-1/+1
\| \| \|
* \| \|	Merge bb-10.2-ext into 10.3	Marko Mäkelä	2017-10-04	1	-1/+1
\|\ \ \ \| \|/ /
\| * \|	MDEV-13384 - misc Windows warnings fixed	Vladislav Vaintroub	2017-09-28	1	-1/+1
\| \| \|