summaryrefslogtreecommitdiff
path: root/tests
diff options
context:
space:
mode:
authorBinbin <binloveplay1314@qq.com>2023-03-12 19:25:10 +0800
committerGitHub <noreply@github.com>2023-03-12 13:25:10 +0200
commit4e7eb16ae70d2664d169e412b965f6e9143de7a0 (patch)
tree6d2d3c4ebe3581dd7887daf2520e84ed63ef2b98 /tests
parent4ba47d2d2163ea77aacc9f719db91af2d7298905 (diff)
downloadredis-4e7eb16ae70d2664d169e412b965f6e9143de7a0.tar.gz
Fix race in sentinel manual failover test (#11900)
In #9408, we added some SENTINEL DEBUG to reduce default timeouts and allow tests to execute faster. The change in 05-manual.tcl may cause a race that SENTINEL FAILOVER response with a NOGOODSLAVE: ``` Manual failover works: FAILED: Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) (Jumping to next unit after error) FAILED: caught an error in the test assertion:Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) ``` The reason is that the info-period value was reduced in #9408 (the default value is 10000), and then manual failover was performed immediately, but the INFO may not exchanged between the sentinel and replicas, causing the sentinel to skip all the replicas in sentinelSelectSlave (Because replica's info_refresh is not updated, see the code snippet below), then return a NOGOODSLAVE, break the test. Code snippet from sentinelSelectSlave: ``` while((de = dictNext(di)) != NULL) { sentinelRedisInstance *slave = dictGetVal(de); mstime_t info_validity_time; if (master->flags & SRI_S_DOWN) info_validity_time = sentinel_ping_period*5; else info_validity_time = sentinel_info_period*3; if (mstime() - slave->info_refresh > info_validity_time) continue; } ``` By adding a wait_for_condition, we have the opportunity to let sentinel update the info_period of the replicas.
Diffstat (limited to 'tests')
-rw-r--r--tests/sentinel/tests/05-manual.tcl15
1 files changed, 14 insertions, 1 deletions
diff --git a/tests/sentinel/tests/05-manual.tcl b/tests/sentinel/tests/05-manual.tcl
index a0004eb75..72d80fdf8 100644
--- a/tests/sentinel/tests/05-manual.tcl
+++ b/tests/sentinel/tests/05-manual.tcl
@@ -12,8 +12,21 @@ test "Manual failover works" {
set old_port [RPort $master_id]
set addr [S 0 SENTINEL GET-MASTER-ADDR-BY-NAME mymaster]
assert {[lindex $addr 1] == $old_port}
+
+ # Since we reduced the info-period (default 10000) above immediately,
+ # sentinel - replica may not have enough time to exchange INFO and update
+ # the replica's info-period, so the test may get a NOGOODSLAVE.
+ wait_for_condition 300 50 {
+ [catch {S 0 SENTINEL FAILOVER mymaster}] == 0
+ } else {
+ catch {S 0 SENTINEL FAILOVER mymaster} reply
+ puts [S 0 SENTINEL REPLICAS mymaster]
+ fail "Sentinel manual failover did not work, got: $reply"
+ }
+
catch {S 0 SENTINEL FAILOVER mymaster} reply
- assert {$reply eq "OK"}
+ assert_match {*INPROG*} $reply ;# Failover already in progress
+
foreach_sentinel_id id {
wait_for_condition 1000 50 {
[lindex [S $id SENTINEL GET-MASTER-ADDR-BY-NAME mymaster] 1] != $old_port