ovsdb: raft: Fix transaction double commit due to lost leadership.

While becoming a follower, the leader aborts all the current 'in-flight' commands, so the higher layers can re-try corresponding transactions when the new leader is elected. However, most of these commands are already sent to followers as append requests, hence they will actually be committed by the majority of the cluster members, i.e. will be treated as committed by the new leader, unless there is an actual network problem between servers. However, the old leader will decline append replies, since it's not the leader anymore and commands are already completed with RAFT_CMD_LOST_LEADERSHIP status. New leader will replicate the commit index back to the old leader. Old leader will re-try the previously "failed" transaction, because "cluster error"s are temporary. If a transaction had some prerequisites that didn't allow double committing or there are other database constraints (like indexes) that will not allow a transaction to be committed twice, the server will reply to the client with a false-negative transaction result. If there are no prerequisites or additional database constraints, the server will execute the same transaction again, but as a follower. E.g. in the OVN case, this may result in creation of duplicated logical switches / routers / load balancers. I.e. resources with the same non-indexed name. That may cause issues later where ovn-nbctl will not be able to add ports to these switches / routers. Suggested solution is to not complete (abort) the commands, but allow them to be completed with the commit index update from a new leader. It the similar behavior to what we do in order to complete commands in a backward scenario when the follower becomes a leader. That scenario was fixed by commit 5a9b53a51ec9 ("ovsdb raft: Fix duplicated transaction execution when leader failover."). Code paths for leader and follower inside the raft_update_commit_index were very similar, so they were refactored into one, since we also needed an ability to complete more than one command for a follower. Failure test added to exercise scenario of a leadership transfer. Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.") Fixes: 3c2d6274bcee ("raft: Transfer leadership before creating snapshots.") Reported-at: https://bugzilla.redhat.com/2046340 Acked-by: Han Zhou <hzhou@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
author: Ilya Maximets <i.maximets@ovn.org> 2022-05-24 22:18:00 +0200
committer: Ilya Maximets <i.maximets@ovn.org> 2022-05-26 11:43:53 +0200
commit: 04e5adfedd2a2e4ceac62136671057542a7cf875 (patch)
tree: 27c2f90d2946af235e629a790f3f2e1d05be302c /tests
parent: 336d7dd7cc6f2eb4a10601bff6ab74449324ba0e (diff)
download: openvswitch-04e5adfedd2a2e4ceac62136671057542a7cf875.tar.gz
1 files changed, 21 insertions, 6 deletions
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index ee9c7b937..0f7076a05 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -468,6 +468,7 @@ ovsdb_cluster_failure_test () {
     if test "$crash_node" == "1"; then
         new_leader=$5
     fi
+    log_grep=$6
 
     cp $top_srcdir/vswitchd/vswitch.ovsschema schema
     schema=`ovsdb-tool schema-name schema`
@@ -488,7 +489,7 @@ ovsdb_cluster_failure_test () {
     start_server() {
         local i=$1
         printf "\ns$i: starting\n"
-        AT_CHECK([ovsdb-server -vjsonrpc -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db])
+        AT_CHECK([ovsdb-server -vjsonrpc -vraft -vconsole:off -vsyslog:off --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb s$i.db])
     }
     connect_server() {
         local i=$1
@@ -514,14 +515,23 @@ ovsdb_cluster_failure_test () {
         fi
         AT_CHECK([ovs-appctl -t "`pwd`"/s$delay_election_node cluster/failure-test delay-election], [0], [ignore])
     fi
+
+    # Initializing the database separately to avoid extra 'wait' operation
+    # in later transactions.
+    AT_CHECK([ovs-vsctl -v --db="$db" --no-leader-only --no-shuffle-remotes --no-wait init], [0], [ignore], [ignore])
+
     AT_CHECK([ovs-appctl -t "`pwd`"/s$crash_node cluster/failure-test $crash_command], [0], [ignore])
     AT_CHECK([ovs-vsctl -v --db="$db" --no-leader-only --no-shuffle-remotes --no-wait create QoS type=x], [0], [ignore], [ignore])
 
-    # Make sure that the node really crashed.
-    AT_CHECK([ls s$crash_node.ovsdb], [2], [ignore], [ignore])
-    # XXX: Client will fail if remotes contains unix socket that doesn't exist (killed).
-    if test "$remote_1" = "$crash_node"; then
-        db=unix:s$remote_2.ovsdb
+    # Make sure that the node really crashed or has specific log message.
+    if test -z "$log_grep"; then
+        AT_CHECK([ls s$crash_node.ovsdb], [2], [ignore], [ignore])
+        # XXX: Client will fail if remotes contains unix socket that doesn't exist (killed).
+        if test "$remote_1" = "$crash_node"; then
+            db=unix:s$remote_2.ovsdb
+        fi
+    else
+        OVS_WAIT_UNTIL([grep -q "$log_grep" s${crash_node}.log])
     fi
     AT_CHECK([ovs-vsctl --db="$db" --no-leader-only --no-wait --columns=type --bare list QoS], [0], [x
 ])
@@ -617,6 +627,11 @@ AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
 ovsdb_cluster_failure_test 2 2 3 crash-after-receiving-append-request-update
 AT_CLEANUP
 
+AT_SETUP([OVSDB cluster - txn on leader, leader transfers leadership after sending appendReq])
+AT_KEYWORDS([ovsdb server negative unix cluster pending-txn transfer])
+ovsdb_cluster_failure_test 1 2 1 transfer-leadership-after-sending-append-request -1 "Transferring leadership"
+AT_CLEANUP
+
 
 AT_SETUP([OVSDB cluster - competing candidates])
 AT_KEYWORDS([ovsdb server negative unix cluster competing-candidates])
author	Ilya Maximets <i.maximets@ovn.org>	2022-05-24 22:18:00 +0200
committer	Ilya Maximets <i.maximets@ovn.org>	2022-05-26 11:43:53 +0200
commit	04e5adfedd2a2e4ceac62136671057542a7cf875 (patch)
tree	27c2f90d2946af235e629a790f3f2e1d05be302c /tests
parent	336d7dd7cc6f2eb4a10601bff6ab74449324ba0e (diff)
download	openvswitch-04e5adfedd2a2e4ceac62136671057542a7cf875.tar.gz