summaryrefslogtreecommitdiff
path: root/sql/slave.cc
Commit message (Collapse)AuthorAgeFilesLines
* MDEV-23328 Server hang due to Galera lock conflict resolutionSergei Golubchik2021-02-121-1/+1
| | | | adaptation of 29bbcac0ee8 for 10.4
* don't take mutexes conditionallySergei Golubchik2021-02-121-8/+2
|
* Merge branch 'bb-10.3-release' into bb-10.4-releaseSergei Golubchik2021-02-121-211/+66
|\ | | | | | | | | Note, the fix for "MDEV-23328 Server hang due to Galera lock conflict resolution" was null-merged. 10.4 version of the fix is coming up separately
| * Merge branch '10.2' into 10.3Sergei Golubchik2021-02-011-178/+45
| |\
| | * cleanup: remove slave background thread, use handle_manager thread insteadSergei Golubchik2021-01-241-141/+31
| | |
| | * MDEV-10272: add master host/port info to slave thread exit messagesDaniel Black2021-01-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sample log error message generated: 2021-01-21 2:33:24 139912137520896 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 369 33:24 139912137520896 [Note] master was 127.0.0.1:16400 2021-01-21 2:33:24 139912137828096 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 369 2021-01-21 2:33:24 139912137828096 [Note] master was 127.0.0.1:16400 Based on work by Hartmut Holzgraefe. Reviewer: knielsen@knielsen-hq.org, Andrei, Sachin
* | | Merge branch '10.3' into 10.4Sujatha2020-11-121-2/+2
|\ \ \ | |/ /
| * | Merge branch '10.2' into 10.3Sujatha2020-11-121-4/+15
| |\ \ | | |/
| | * MDEV-4633: multi_source.simple test fails sporadicallySujatha2020-11-121-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Analysis: ======== Writes to 'rli->log_space_total' needs to be synchronized, otherwise both SQL_THREAD and IO_THREAD can try to modify the variable simultaneously resulting in incorrect rli->log_space_total. In the current test scenario SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously. Hence test occasionally fails with result mismatch. Fix: === Convert 'rli->log_space_total' variable to atomic type.
* | | Merge 10.3 into 10.4Marko Mäkelä2020-07-311-1/+2
|\ \ \ | |/ /
| * | Merge 10.2 into 10.3Marko Mäkelä2020-07-311-1/+2
| |\ \ | | |/
| | * MDEV-14203: rpl.rpl_extra_col_master_myisam, ↵Sujatha2020-07-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpl.rpl_slave_load_tmpdir_not_exist failed in buildbot with a warning Problem: ======= rpl.rpl_slave_load_tmpdir_not_exist 'stmt' w3 [ fail ] Found warnings/errors in server log file! Test ended at 2017-09-27 20:34:55 [Warning] Master is configured to log replication events with checksum, but will not send such events to slaves that cannot process them ^ Found warnings in /mnt/buildbot/build/mariadb-10.2.10/mysql-test/var/3/log/mysqld.1.err ok Analysis: ======== When slave tries to connect to master 'get_master_version_and_clock' function is invoked to perform elaborated slave-master handshake. During this process slave server queries master server, to know if it is checksum aware and at the same time master is notified about its CRC-awareness. The master's side instant value of @@global.binlog_checksum is stored in the dump thread's uservar area as well as cached locally to become known in consensus by master and slave. Post hand-shake slave requests master for binlog dump. It sends 'COM_BINLOG_DUMP'. This command is sent to master by 'cli_advanced_command' call. If there is some temporary network failure during this request_dump call, 'end_server' is invoked to close the current connection between master and slave. Upon connection close the dump thread on the master gets terminated and it clears the 'uservar' data it got through master-slave handshake. The 'COM_BINLOG_DUMP' command is sent once again without master-slave handshake. Since the checksum data is not available with new dump thread a warning gets reported. Fix: === Upon network write error donot attempt reconnect, proceed to master-slave handshake. This ensures that master is aware of slave's capability to use checksums.
* | | MDEV-22370 safe_mutex: Trying to lock uninitialized mutex at ↵Sachin2020-06-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /data/src/10.4-bug/sql/rpl_parallel.cc, line 470 upon shutdown during FTWRL Problem:- When we issue FTWRL with shutdown in parallel, there is race between FTWRL and shutdown. Shutdown might destroy the mutex (pool->LOCK_rpl_thread_pool) before FTWRL can lock it. So we can get crash on FTWRL thread Solution:- mysql_mutex_destroy(pool->LOCK_rpl_thread_pool) should wait for FTWRL thread to complete its work , and then destroy. So slave_prepare_for_shutdown will just deactivate the pool, and mutex is destroyed later in end_slave()
* | | Merge 10.3 into 10.4Marko Mäkelä2020-05-301-7/+24
|\ \ \ | |/ /
| * | Merge 10.2 into 10.3Marko Mäkelä2020-05-271-7/+24
| |\ \ | | |/
| | * MDEV-15152 Optimistic parallel slave doesnt cope well with START SLAVE UNTILAndrei Elkin2020-05-261-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The immediate bug was caused by a failure to recognize a correct position to stop the slave applier run in optimistic parallel mode. There were the following set of issues that the analysis unveil. 1 incorrect estimate for the event binlog position passed to is_until_satisfied 2 wait for workers to complete by the driver thread did not account non-group events that could be left unprocessed and thus to mix up the last executed binlog group's file and position: the file remained old and the position related to the new rotated file 3 incorrect 'slave reached file:pos' by the parallel slave report in the error log 4 relay log UNTIL missed out the parallel slave branch in is_until_satisfied. The patch addresses all of them to simplify logics of log change notification in either the master and relay-log until case. P.1 is addressed with passing the event into is_until_satisfied() for proper analisis by the function. P.2 is fixed by changes in handle_queued_pos_update(). P.4 required removing relay-log change notification by workers. Instead the driver thread updates the notion of the current relay-log fully itself with aid of introduced bool Relay_log_info::until_relay_log_names_defer. An extra print out of the requested until file:pos is arranged with --log-warning=3.
* | | Merge 10.3 into 10.4Marko Mäkelä2020-05-051-0/+1
|\ \ \ | |/ /
| * | Merge branch '10.2' into 10.3Oleksandr Byelkin2020-05-041-0/+1
| |\ \ | | |/
| | * Merge branch '10.1' into 10.2Oleksandr Byelkin2020-05-021-0/+1
| | |\
| | | * Merge branch '5.5' into 10.1Oleksandr Byelkin2020-04-301-0/+1
| | | |\
| | | | * Bug#29915479 RUNNING COM_REGISTER_SLAVE WITHOUT COM_BINLOG_DUMP CAN RESULTS ↵Sergei Golubchik2020-04-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IN SERVER EXIT in fact, in MariaDB it cannot, but it can show spurious slaves in SHOW SLAVE HOSTS. slave was registered in COM_REGISTER_SLAVE and un-registered after COM_BINLOG_DUMP. If there was no COM_BINLOG_DUMP, it would never unregister.
| | | | * cleanup: remove dbug keywords that are never usedSergei Golubchik2020-04-291-33/+0
| | | | |
| * | | | Merge 10.2 into 10.3Marko Mäkelä2020-04-271-3/+5
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.1 into 10.2Marko Mäkelä2020-04-271-3/+5
| | |\ \ \ | | | |/ /
| | | * | MDEV-22203: WSREP_ON is unnecessarily expensive to evaluateMarko Mäkelä2020-04-271-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a backport of the applicable part of commit 93475aff8de80a0ef53cbee924bcb70de6e86f2c and commit 2c39f69d34e64a5cf94720e82e78c0ee91bd4649 from 10.4. Before 10.4 and Galera 4, WSREP_ON is a macro that points to a global Boolean variable, so it is not that expensive to evaluate, but we will add an unlikely() hint around it. WSREP_ON_NEW: Remove. This macro was introduced in commit c863159c320008676aff978a7cdde5732678f975 when reverting WSREP_ON to its previous definition. We replace some use of WSREP_ON with WSREP(thd), like it was done in 93475aff8de80a0ef53cbee924bcb70de6e86f2c. Note: the macro WSREP() in 10.1 is equivalent to WSREP_NNULL() in 10.4. Item_func_rand::seed_random(): Avoid invoking current_thd when WSREP is not enabled.
* | | | | MDEV-22203: WSREP_ON is unnecessarily expensive to evaluateJan Lindström2020-04-241-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replaced WSREP_ON macro by single global variable WSREP_ON that is then updated at server statup and on wsrep_on and wsrep_provider update functions.
* | | | | Relay_log_info::executed_entries to Atomic_counterSergey Vojtovich2020-04-151-2/+2
| | | | |
* | | | | Merge 10.3 into 10.4Marko Mäkelä2020-03-301-1/+1
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.2 into 10.3Marko Mäkelä2020-03-301-0/+1
| |\ \ \ \ | | |/ / /
| | * | | MDEV-21473 conflicts with async slave BF aborting (#1475)seppo2020-03-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If async slave thread (slave SQL handler), becomes a BF victim, it may occasionally happen that rollbacker thread is used to carry out the rollback instead of the async slave thread. This can happen, if async slave thread has flagged "idle" state when BF thread tries to figure out how to kill the victim. The issue was possible to test by using a galera cluster as slave for external master, and issuing high load of conflicting writes through async replication and directly against galera cluster nodes. However, a deterministic mtr test for the "conflict window" has not yet been worked on. The fix, in this patch makes sure that async slave thread state is never set to IDLE. This prevents the rollbacker thread to intervene. The wsrep_query_state change was refactored to happen by dedicated function to make controlling the idle state change in one place.
* | | | | gtid_pos_table: my_atomic to std::atomicSergey Vojtovich2020-03-211-4/+4
| | | | |
* | | | | Merge commit '10.3' into 10.4Oleksandr Byelkin2020-03-111-12/+19
|\ \ \ \ \ | |/ / / /
| * | | | Merge branch '10.2' into 10.3Oleksandr Byelkin2020-03-061-2/+22
| |\ \ \ \ | | |/ / /
| | * | | MDEV-21723 Async slave thread BF abort and replaying fixes (#1448)seppo2020-02-231-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If async replication slave thread conflicts with cluster replication, then the async slave transaction should be BF aborted, and depending on the state of async slave transaction execution, potentially also replayed. There were problems in such BF abort implementation and the replaying was not started. This pull request contains fixes which make sure that if async slave thread is marked to abort and replay, it will complete carry out the rollback and release all locks and resources before starting the replaying. After replaying, async slave transactions is treated as successful, so the slave thread will continue as usual, handling next replication event. There is also new mtr test: galera.galera_slave_replay, which stresses both a certification failure for async slave thread and a successful BF abort followed by replaying.
* | | | | Merge branch '10.3' into 10.4Oleksandr Byelkin2020-01-211-2/+2
|\ \ \ \ \ | |/ / / /
| * | | | Merge branch '10.2' into 10.3Oleksandr Byelkin2020-01-211-2/+2
| |\ \ \ \ | | |/ / /
| | * | | Merge branch '10.1' into 10.2Oleksandr Byelkin2020-01-201-2/+2
| | |\ \ \ | | | |/ /
| | | * | Merge branch '5.5' into 10.1Oleksandr Byelkin2020-01-191-2/+2
| | | |\ \ | | | | |/
| | | | * Use get_ident_len in heartbeat event error messagesMarkus Mäkelä2020-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The string doesn't appear to be null-terminated when binlog checksums are enabled. This causes a corrupt binlog name in the error message when a slave is ahead of the master.
* | | | | MDEV-20821 parallel slave server shutdown hangAndrei Elkin2020-01-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parallel slave server shutdown found to be hanging in close_connections() triggered by shutdown due to a slave worker thread would not be notified to exit in case the worker was sitting idle. Fixed with destroying the worker pool earlier that is in slave_prepare_for_shutdown() when all their driver threads have already left. A test file is added to simulate the bug condition as well as check multi-sourced and not-idle worker cases.
* | | | | Merge branch '10.3' into 10.4Oleksandr Byelkin2019-12-091-0/+38
|\ \ \ \ \ | |/ / / /
| * | | | Merge branch '10.2' into 10.3Oleksandr Byelkin2019-12-041-0/+38
| |\ \ \ \ | | |/ / /
| | * | | MDEV-18497 : CTAS async replication from mariadb master crashes galera nodes ↵Jan Lindström2019-12-041-9/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1410) In MariaDB 10.2 master could have been configured so that there is extra annotate events. When we peak next event type for CTAS we need to skip annotate events.
| | * | | Merge branch '10.1' into 10.2Oleksandr Byelkin2019-12-031-1/+32
| | |\ \ \ | | | |/ /
| | | * | MDEV-19572 async slave node fails to apply MyISAM only writes (#1418)seppo2019-11-261-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem happens when MariaDB master replicates writes for only non InnoDB tables (e.g. writes to MyISAM table(s)). Async slave node, in Galera cluster, can apply these writes successfully, but it will, in the end, write gtid position in mysql.gtid_slave_pos table. mysql.gtid_slave_pos table is InnoDB engine, and this write makes innodb handlerton part of the replicated "transaction". Note that wsrep patch identifies that write to gtid_slave_pos should not be replicated and skips appending wsrep keys for these writes. However, as InnoDB was present in the transaction, and there are replication events (for MyISAM table) in transaction cache, but there are no appended keys, wsrep raises an error, and this makes the söave thread to stop. The fix is simply to not treat it as an error if async slave tries to replicate a write set with binlog events, but no keys. We just skip wsrep replication and return successfully. This commit contains also a mtr test which forces mysql.gtid_slave_pos table isto be of InnoDB engine, and executes MyISAM only write through asyn replication. There is additional fix for declaring IO and background slave threads as non wsrep. These threads should not write anything for wsrep replication, and this is just a safeguard to make sure nothing leaks into cluster from these slave threads.
| | | * | MDEV-18497 CTAS async replication from mariadb master crashes galera nodes ↵seppo2019-11-181-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#1410) This PR contains a mtr test for reproducing a failure with replicating create table as select statement (CTAS) through asynchronous mariadb replication to mariadb galera cluster. The problem happens when CTAS replication contains both create table statement followed by row events for populating the table. In such situation, the galera node operating as mariadb replication slave, will first replicate only the create table part into the cluster, and then perform another replication containing both the create table and row events. This will lead all other nodes to fail for duplicate table create attempt, and crash due to this failure. PR contains also a fix, which identifies the situation when CTAS has been replicated, and makes further scan in async replication stream to see if there are following row events. The slave node will replicate either single TOI in case the CTAS table is empty, or if CTAS table contains rows, then single bundled write set with create table and row events is replicated to galera cluster. This fix should keep master server's GTID's for CTAS replication in sync with GTID's in galera cluster.
| | | * | imporve clang buildEugene Kosov2019-06-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug Maintainer mode makes all warnings errors. This patch fix warnings. Mostly about deprecated `register` keyword. Too much warnings came from Mroonga and I gave up on it.
* | | | | Merge 10.3 into 10.4Marko Mäkelä2019-10-101-0/+12
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.2 into 10.3Marko Mäkelä2019-10-091-0/+11
| |\ \ \ \ | | |/ / /
| | * | | MDEV-20574 Position of events reported by mysqlbinlog is wrong with ↵Sachin Setiya2019-10-081-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | encrypted binlogs, SHOW BINLOG EVENTS reports the correct one. Analysis Mysqlbinlog output for encrypted binary log #Q> insert into tab1 values (3,'row 003') #190912 17:36:35 server id 10221 end_log_pos 980 CRC32 0x53bcb3d3 Table_map: `test`.`tab1` mapped to number 19 # at 940 #190912 17:36:35 server id 10221 end_log_pos 1026 CRC32 0xf2ae5136 Write_rows: table id 19 flags: STMT_END_F Here we can see Table_map_log_event ends at 980 but Next event starts at 940. And the reason for that is we do not send START_ENCRYPTION_EVENT to the slave Solution:- Send Start_encryption_log_event as Ignorable_log_event to slave(mysqlbinlog), So that mysqlbinlog can update its log_pos. Since Slave can request multiple FORMAT_DESCRIPTION_EVENT while master does not have so We only update slave master pos when master actually have the FORMAT_DESCRIPTION_EVENT. Similar logic should be applied for START_ENCRYPTION_EVENT. Also added the test case when new server reads the data from old server which does not send START_ENCRYPTION_EVENT to slave. Master Slave Upgrade Scenario. When Slave is updated first, Slave will have extra logic of handling START_ENCRYPTION_EVENT But master willnot be sending START_ENCRYPTION_EVENT. So there will be no issue. When Master is updated first, It will send START_ENCRYPTION_EVENT to slave , But slave will ignore this event in queue_event.