summaryrefslogtreecommitdiff
path: root/tests
diff options
context:
space:
mode:
authorOran Agra <oran@redislabs.com>2023-01-04 10:56:09 +0200
committerGitHub <noreply@github.com>2023-01-04 10:56:09 +0200
commit0ecf6cdc0a81f79e6875191ef9aa4f16c9110223 (patch)
tree8e06fdf4ccd825482a707e913d71e70b531d8179 /tests
parentdec529f4be3e3314300bb513e7a9f3af636e13b0 (diff)
downloadredis-0ecf6cdc0a81f79e6875191ef9aa4f16c9110223.tar.gz
fix handshake timeout replication test race (#11640)
Test on ARM + TLS often fail with this error: ``` *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl Replica is not able to detect timeout ``` https://github.com/redis/redis-extra-ci/actions/runs/3727554226/jobs/6321797837 The replica logs show that in this case the replica got timeout before even getting a response to the PING command (instead of the SYNC command). it should have shown these: ``` * MASTER <-> REPLICA sync started * REPLICAOF 127.0.0.1:22112 enabled .... ### Starting test Slave enters handshake in tests/integration/replication.tcl * Non blocking connect for SYNC fired the event. ``` then: ``` * Master replied to PING, replication can continue... * Trying a partial resynchronization (request 50da9eff70d774f4e6cb723eb4b091440f215772:1). ``` and then hang for 5 seconds: ``` # Timeout connecting to the MASTER... * Reconnecting to MASTER 127.0.0.1:21112 after failure ``` but instead it got this (looks like it disconnected too early, and then tried to re-connect): ``` 10890:M 19 Dec 2022 01:32:54.794 * Ready to accept connections tls 10890:M 19 Dec 2022 01:32:54.809 - Accepted 127.0.0.1:41047 10890:M 19 Dec 2022 01:32:54.878 - Reading from client: error:0A000126:SSL routines::unexpected eof while reading 10890:M 19 Dec 2022 01:32:54.925 - Accepted 127.0.0.1:39207 10890:S 19 Dec 2022 01:32:55.463 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 10890:S 19 Dec 2022 01:32:55.463 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:32:55.463 * MASTER <-> REPLICA sync started 10890:S 19 Dec 2022 01:32:55.463 * REPLICAOF 127.0.0.1:24126 enabled (user request from 'id=4 addr=127.0.0.1:39207 laddr=127.0.0.1:24125 fd=8 name= age=1 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=43 qbuf-free=20431 argv-mem=21 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22317 events=r cmd=slaveof user=default redir=-1 resp=2') ### Starting test Slave enters handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:32:55.476 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:00.701 # Failed to read response from the server: (null) <- note this!! 10890:S 19 Dec 2022 01:33:00.701 # Master did not respond to command during SYNC handshake 10890:S 19 Dec 2022 01:33:01.002 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:33:01.002 * MASTER <-> REPLICA sync started ### Starting test Slave is able to detect timeout during handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:33:05.497 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:05.500 * Master replied to PING, replication can continue... 10890:S 19 Dec 2022 01:33:05.510 * Trying a partial resynchronization (request 947e1956372a0e6c819cfec51c42cc7979b0c221:1). 10890:S 19 Dec 2022 01:34:05.833 # Failed to read response from the server: error:0A000126:SSL routines::unexpected eof while reading 10890:S 19 Dec 2022 01:34:05.833 # Master did not reply to PSYNC, will try later ``` This PR sets enables the 5 seconds timeout at a later stage to try and prevent the early disconnection.
Diffstat (limited to 'tests')
-rw-r--r--tests/integration/replication.tcl10
1 files changed, 5 insertions, 5 deletions
diff --git a/tests/integration/replication.tcl b/tests/integration/replication.tcl
index 175506c17..319462695 100644
--- a/tests/integration/replication.tcl
+++ b/tests/integration/replication.tcl
@@ -20,11 +20,6 @@ start_server {tags {"repl network external:skip"}} {
$master config set repl-diskless-sync yes
$master config set repl-diskless-sync-delay 1000
- # Use a short replication timeout on the slave, so that if there
- # are no bugs the timeout is triggered in a reasonable amount
- # of time.
- $slave config set repl-timeout 5
-
# Start the replication process...
$slave slaveof $master_host $master_port
@@ -36,6 +31,11 @@ start_server {tags {"repl network external:skip"}} {
}
}
+ # Use a short replication timeout on the slave, so that if there
+ # are no bugs the timeout is triggered in a reasonable amount
+ # of time.
+ $slave config set repl-timeout 5
+
# But make the master unable to send
# the periodic newlines to refresh the connection. The slave
# should detect the timeout.