diff options
author | sjaakola <seppo.jaakola@iki.fi> | 2020-12-09 21:53:18 +0200 |
---|---|---|
committer | Jan Lindström <jan.lindstrom@mariadb.com> | 2021-01-18 08:09:06 +0200 |
commit | beaea31ab12ab56ea8a6eb5e99cf82648675ea78 (patch) | |
tree | 92247ad2c9fcfa61e97f151028c5f77573039880 /mysql-test | |
parent | cf6114ebea0224f265d1ac25592fce9b984b0e6e (diff) | |
download | mariadb-git-beaea31ab12ab56ea8a6eb5e99cf82648675ea78.tar.gz |
MDEV-23851 BF-BF Conflict issue because of UK GAP locks
Some DML operations on tables having unique secondary keys cause scanning
in the secondary index, for instance to find potential unique key violations
in the seconday index. This scanning may involve GAP locking in the index.
As this locking happens also when applying replication events in high priority
applier threads, there is a probabality for lock conflicts between two wsrep
high priority threads.
This PR avoids lock conflicts of high priority wsrep threads, which do
secondary index scanning e.g. for duplicate key detection.
The actual fix is the patch in sql_class.cc:thd_need_ordering_with(), where
we allow relaxed GAP locking protocol between wsrep high priority threads.
wsrep high priority threads (replication appliers, replayers and TOI processors)
are ordered by the replication provider, and they will not need serializability
support gained by secondary index GAP locks.
PR contains also a mtr test, which exercises a scenario where two replication
applier threads have a false positive conflict in GAP of unique secondary index.
The conflicting local committing transaction has to replay, and the test verifies
also that the replaying phase will not conflict with the latter repllication applier.
Commit also contains new test scenario for galera.galera_UK_conflict.test,
where replayer starts applying after a slave applier thread, with later seqno,
has advanced to commit phase. The applier and replayer have false positive GAP
lock conflict on secondary unique index, and replayer should ignore this.
This test scenario caused crash with earlier version in this PR, and to fix this,
the secondary index uniquenes checking has been relaxed even further.
Now innodb trx_t structure has new member: bool wsrep_UK_scan, which is set to
true, when high priority thread is performing unique secondary index scanning.
The member trx_t::wsrep_UK_scan is defined inside WITH_WSREP directive, to make
it possible to prepare a MariaDB build where this additional trx_t member is
not present and is not used in the code base. trx->wsrep_UK_scan is set to true
only for the duration of function call for: lock_rec_lock() trx->wsrep_UK_scan
is used only in lock_rec_has_to_wait() function to relax the need to wait if
wsrep_UK_scan is set and conflicting transaction is also high priority.
Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
Diffstat (limited to 'mysql-test')
-rw-r--r-- | mysql-test/suite/galera/r/galera_UK_conflict.result | 89 | ||||
-rw-r--r-- | mysql-test/suite/galera/t/galera_UK_conflict.test | 148 |
2 files changed, 237 insertions, 0 deletions
diff --git a/mysql-test/suite/galera/r/galera_UK_conflict.result b/mysql-test/suite/galera/r/galera_UK_conflict.result new file mode 100644 index 00000000000..76649f1b268 --- /dev/null +++ b/mysql-test/suite/galera/r/galera_UK_conflict.result @@ -0,0 +1,89 @@ +CREATE TABLE t1 (f1 INTEGER PRIMARY KEY, f2 int, f3 int, unique key keyj (f2)); +INSERT INTO t1 VALUES (1, 1, 0); +INSERT INTO t1 VALUES (3, 3, 0); +INSERT INTO t1 VALUES (10, 10, 0); +SET GLOBAL wsrep_slave_threads = 3; +SET GLOBAL DEBUG_DBUG = "d,sync.wsrep_apply_cb"; +connection node_1; +SET SESSION wsrep_sync_wait=0; +START TRANSACTION; +DELETE FROM t1 WHERE f2 = 3; +INSERT INTO t1 VALUES (3, 3, 1); +connect node_1a, 127.0.0.1, root, , test, $NODE_MYPORT_1; +connection node_1a; +SET SESSION wsrep_sync_wait=0; +connection node_2; +INSERT INTO t1 VALUES (5, 5, 2); +connection node_1a; +SET SESSION DEBUG_SYNC = "now WAIT_FOR sync.wsrep_apply_cb_reached"; +SET GLOBAL wsrep_provider_options = 'dbug=d,apply_monitor_slave_enter_sync'; +connection node_2; +INSERT INTO t1 VALUES (4, 4, 2); +connection node_1a; +SET SESSION wsrep_on = 0; +SET SESSION wsrep_on = 1; +SET GLOBAL wsrep_provider_options = 'dbug='; +SET GLOBAL wsrep_provider_options = 'dbug=d,commit_monitor_enter_sync'; +connection node_1; +COMMIT; +connection node_1a; +SET SESSION wsrep_on = 0; +SET SESSION wsrep_on = 1; +SET GLOBAL wsrep_provider_options = 'dbug='; +SET GLOBAL wsrep_provider_options = 'signal=commit_monitor_enter_sync'; +SET GLOBAL wsrep_provider_options = 'dbug='; +SET GLOBAL DEBUG_DBUG = ""; +SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb"; +SET GLOBAL debug_dbug = NULL; +SET debug_sync='RESET'; +SET GLOBAL DEBUG_DBUG = "d,sync.wsrep_apply_cb"; +SET GLOBAL wsrep_provider_options = 'signal=apply_monitor_slave_enter_sync'; +SET SESSION DEBUG_SYNC = "now WAIT_FOR sync.wsrep_apply_cb_reached"; +SET GLOBAL wsrep_provider_options = 'dbug=d,commit_monitor_enter_sync'; +SET GLOBAL wsrep_provider_options = 'dbug='; +SET GLOBAL DEBUG_DBUG = ""; +SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb"; +SET GLOBAL debug_dbug = NULL; +SET debug_sync='RESET'; +SET GLOBAL wsrep_provider_options = 'signal=commit_monitor_enter_sync'; +SET GLOBAL wsrep_provider_options = 'dbug='; +connection node_1; +SELECT * FROM t1; +f1 f2 f3 +1 1 0 +3 3 1 +4 4 2 +5 5 2 +10 10 0 +wsrep_local_replays +1 +SET GLOBAL wsrep_slave_threads = DEFAULT; +connection node_2; +SELECT * FROM t1; +f1 f2 f3 +1 1 0 +3 3 1 +4 4 2 +5 5 2 +10 10 0 +INSERT INTO t1 VALUES (7,7,7); +INSERT INTO t1 VALUES (8,8,8); +SELECT * FROM t1; +f1 f2 f3 +1 1 0 +3 3 1 +4 4 2 +5 5 2 +7 7 7 +8 8 8 +10 10 0 +connection node_1; +SELECT * FROM t1; +f1 f2 f3 +1 1 0 +3 3 1 +4 4 2 +5 5 2 +7 7 7 +10 10 0 +DROP TABLE t1; diff --git a/mysql-test/suite/galera/t/galera_UK_conflict.test b/mysql-test/suite/galera/t/galera_UK_conflict.test new file mode 100644 index 00000000000..57bafbf8ae0 --- /dev/null +++ b/mysql-test/suite/galera/t/galera_UK_conflict.test @@ -0,0 +1,148 @@ +# +# This test tests the operation of transaction replay with a scenario +# where two subsequent write sets in applying conflict with local transaction +# in commit phase. The conflict is "false positive" confict on GAP lock in +# secondary unique index. +# The first applier will cause BF abort for the local committer, which +# starts replaying because of positive certification. +# In buggy version, scenatio continues so that ehile the local transaction +# is replaying, the latter applier experiences similar UK GAP lock conflict +# and forces the replayer to abort second time. +# In fixed version, this latter BF abort should not happen. +# + +--source include/galera_cluster.inc +--source include/have_innodb.inc +--source include/have_debug_sync.inc +--source include/galera_have_debug_sync.inc + +--let $wsrep_local_replays_old = `SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_replays'` + +CREATE TABLE t1 (f1 INTEGER PRIMARY KEY, f2 int, f3 int, unique key keyj (f2)); +INSERT INTO t1 VALUES (1, 1, 0); +INSERT INTO t1 VALUES (3, 3, 0); +INSERT INTO t1 VALUES (10, 10, 0); + +# we will need 2 appliers threads for applyin two write sets in parallel in node1 +# and 1 applier thread for handling replaying +SET GLOBAL wsrep_slave_threads = 3; +SET GLOBAL DEBUG_DBUG = "d,sync.wsrep_apply_cb"; + +--connection node_1 +# starting a transaction, which deletes and inserts the middle row in test table +# this will be victim of false positive conflict with appliers +SET SESSION wsrep_sync_wait=0; +START TRANSACTION; + +DELETE FROM t1 WHERE f2 = 3; +INSERT INTO t1 VALUES (3, 3, 1); + +# Control connection to manage sync points for appliers +--connect node_1a, 127.0.0.1, root, , test, $NODE_MYPORT_1 +--connection node_1a +SET SESSION wsrep_sync_wait=0; + +# send from node 2 first INSERT transaction, which will conflict on GAP lock in node 1 +--connection node_2 +INSERT INTO t1 VALUES (5, 5, 2); + +--connection node_1a +# wait to see the INSERT in apply_cb sync point +SET SESSION DEBUG_SYNC = "now WAIT_FOR sync.wsrep_apply_cb_reached"; + +# first applier seen in wait point, set sync point for the second INSERT +--let $galera_sync_point = apply_monitor_slave_enter_sync +--source include/galera_set_sync_point.inc + +--connection node_2 +# send second insert into same GAP in test table +INSERT INTO t1 VALUES (4, 4, 2); + +--connection node_1a +# wait for the second insert to arrive in his sync point +--let $galera_sync_point = apply_monitor_slave_enter_sync +--source include/galera_wait_sync_point.inc +--source include/galera_clear_sync_point.inc + +# both appliers are now waiting in separate sync points + +# Block the local commit, send the COMMIT and wait until it gets blocked +--let $galera_sync_point = commit_monitor_enter_sync +--source include/galera_set_sync_point.inc + +--connection node_1 +--send COMMIT + +--connection node_1a +# wait for the local commit to enter in commit monitor wait state +--let $galera_sync_point = apply_monitor_slave_enter_sync commit_monitor_enter_sync +--source include/galera_wait_sync_point.inc +--source include/galera_clear_sync_point.inc + +# release the local transaction to continue with commit +--let $galera_sync_point = commit_monitor_enter_sync +--source include/galera_signal_sync_point.inc +--source include/galera_clear_sync_point.inc + +# and now release the first applier, it should force local trx to abort +SET GLOBAL DEBUG_DBUG = ""; +SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb"; +SET GLOBAL debug_dbug = NULL; +SET debug_sync='RESET'; + +# set another sync point for second applier +SET GLOBAL DEBUG_DBUG = "d,sync.wsrep_apply_cb"; + +# letting the second appier to move forward +--let $galera_sync_point = apply_monitor_slave_enter_sync +--source include/galera_signal_sync_point.inc + +# waiting until second applier is in wait +SET SESSION DEBUG_SYNC = "now WAIT_FOR sync.wsrep_apply_cb_reached"; + +# stopping second applier before commit +--let $galera_sync_point = commit_monitor_enter_sync +--source include/galera_set_sync_point.inc +--source include/galera_clear_sync_point.inc + +# releasing the second insert, with buggy version it will conflict with +# replayer +SET GLOBAL DEBUG_DBUG = ""; +SET DEBUG_SYNC = "now SIGNAL signal.wsrep_apply_cb"; +SET GLOBAL debug_dbug = NULL; +SET debug_sync='RESET'; + +# with fixed version, second applier has reached commit monitor, and we can +# release it to complete +--let $galera_sync_point = commit_monitor_enter_sync +--source include/galera_signal_sync_point.inc +--source include/galera_clear_sync_point.inc + +# local commit should succeed +--connection node_1 +--reap + +SELECT * FROM t1; + +# wsrep_local_replays has increased by 1 +--let $wsrep_local_replays_new = `SELECT VARIABLE_VALUE FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_local_replays'` +--disable_query_log +--eval SELECT $wsrep_local_replays_new - $wsrep_local_replays_old = 1 AS wsrep_local_replays; +--enable_query_log + +# returning original slave thread count +SET GLOBAL wsrep_slave_threads = DEFAULT; + +--connection node_2 +SELECT * FROM t1; + +# replicate some transactions, so that wsrep slave thread count can reach +# original state in node 1 +INSERT INTO t1 VALUES (7,7,7); +INSERT INTO t1 VALUES (8,8,8); +SELECT * FROM t1; + +--connection node_1 +SELECT * FROM t1; + +DROP TABLE t1; |