diff options
author | Oran Agra <oran@redislabs.com> | 2023-01-04 10:56:09 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-01-04 10:56:09 +0200 |
commit | 0ecf6cdc0a81f79e6875191ef9aa4f16c9110223 (patch) | |
tree | 8e06fdf4ccd825482a707e913d71e70b531d8179 /tests/integration | |
parent | dec529f4be3e3314300bb513e7a9f3af636e13b0 (diff) | |
download | redis-0ecf6cdc0a81f79e6875191ef9aa4f16c9110223.tar.gz |
fix handshake timeout replication test race (#11640)
Test on ARM + TLS often fail with this error:
```
*** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl
Replica is not able to detect timeout
```
https://github.com/redis/redis-extra-ci/actions/runs/3727554226/jobs/6321797837
The replica logs show that in this case the replica got timeout before even getting a response to the PING command (instead of the SYNC command).
it should have shown these:
```
* MASTER <-> REPLICA sync started
* REPLICAOF 127.0.0.1:22112 enabled ....
### Starting test Slave enters handshake in tests/integration/replication.tcl
* Non blocking connect for SYNC fired the event.
```
then:
```
* Master replied to PING, replication can continue...
* Trying a partial resynchronization (request 50da9eff70d774f4e6cb723eb4b091440f215772:1).
```
and then hang for 5 seconds:
```
# Timeout connecting to the MASTER...
* Reconnecting to MASTER 127.0.0.1:21112 after failure
```
but instead it got this (looks like it disconnected too early, and then tried to re-connect):
```
10890:M 19 Dec 2022 01:32:54.794 * Ready to accept connections tls
10890:M 19 Dec 2022 01:32:54.809 - Accepted 127.0.0.1:41047
10890:M 19 Dec 2022 01:32:54.878 - Reading from client: error:0A000126:SSL routines::unexpected eof while reading
10890:M 19 Dec 2022 01:32:54.925 - Accepted 127.0.0.1:39207
10890:S 19 Dec 2022 01:32:55.463 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
10890:S 19 Dec 2022 01:32:55.463 * Connecting to MASTER 127.0.0.1:24126
10890:S 19 Dec 2022 01:32:55.463 * MASTER <-> REPLICA sync started
10890:S 19 Dec 2022 01:32:55.463 * REPLICAOF 127.0.0.1:24126 enabled (user request from 'id=4 addr=127.0.0.1:39207 laddr=127.0.0.1:24125 fd=8 name= age=1 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=43 qbuf-free=20431 argv-mem=21 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22317 events=r cmd=slaveof user=default redir=-1 resp=2')
### Starting test Slave enters handshake in tests/integration/replication.tcl
10890:S 19 Dec 2022 01:32:55.476 * Non blocking connect for SYNC fired the event.
10890:S 19 Dec 2022 01:33:00.701 # Failed to read response from the server: (null) <- note this!!
10890:S 19 Dec 2022 01:33:00.701 # Master did not respond to command during SYNC handshake
10890:S 19 Dec 2022 01:33:01.002 * Connecting to MASTER 127.0.0.1:24126
10890:S 19 Dec 2022 01:33:01.002 * MASTER <-> REPLICA sync started
### Starting test Slave is able to detect timeout during handshake in tests/integration/replication.tcl
10890:S 19 Dec 2022 01:33:05.497 * Non blocking connect for SYNC fired the event.
10890:S 19 Dec 2022 01:33:05.500 * Master replied to PING, replication can continue...
10890:S 19 Dec 2022 01:33:05.510 * Trying a partial resynchronization (request 947e1956372a0e6c819cfec51c42cc7979b0c221:1).
10890:S 19 Dec 2022 01:34:05.833 # Failed to read response from the server: error:0A000126:SSL routines::unexpected eof while reading
10890:S 19 Dec 2022 01:34:05.833 # Master did not reply to PSYNC, will try later
```
This PR sets enables the 5 seconds timeout at a later stage to try and prevent the early disconnection.
Diffstat (limited to 'tests/integration')
-rw-r--r-- | tests/integration/replication.tcl | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/tests/integration/replication.tcl b/tests/integration/replication.tcl index 175506c17..319462695 100644 --- a/tests/integration/replication.tcl +++ b/tests/integration/replication.tcl @@ -20,11 +20,6 @@ start_server {tags {"repl network external:skip"}} { $master config set repl-diskless-sync yes $master config set repl-diskless-sync-delay 1000 - # Use a short replication timeout on the slave, so that if there - # are no bugs the timeout is triggered in a reasonable amount - # of time. - $slave config set repl-timeout 5 - # Start the replication process... $slave slaveof $master_host $master_port @@ -36,6 +31,11 @@ start_server {tags {"repl network external:skip"}} { } } + # Use a short replication timeout on the slave, so that if there + # are no bugs the timeout is triggered in a reasonable amount + # of time. + $slave config set repl-timeout 5 + # But make the master unable to send # the periodic newlines to refresh the connection. The slave # should detect the timeout. |