summaryrefslogtreecommitdiff
path: root/utilities
diff options
context:
space:
mode:
authorIlya Maximets <i.maximets@ovn.org>2021-05-06 14:47:31 +0200
committerIlya Maximets <i.maximets@ovn.org>2021-05-14 16:00:22 +0200
commit3c2d6274bceecb65ec8f2f93f2aac26897a7ddfe (patch)
treeacd05ca86f831079bb122cbb1521685c2caaaf77 /utilities
parentb5bb044fbe4c1395dcde5cc7d5081ef0099bb8b3 (diff)
downloadopenvswitch-3c2d6274bceecb65ec8f2f93f2aac26897a7ddfe.tar.gz
raft: Transfer leadership before creating snapshots.
With a big database writing snapshot could take a lot of time, for example, on one of the systems compaction of 300MB database takes about 10 seconds to complete. For the clustered database, 40% of this time takes conversion of the database to the file transaction json format, the rest of time is formatting a string and writing to disk. Of course, this highly depends on the disc and CPU speeds. 300MB is the very possible database size for the OVN Southbound DB, and it might be even bigger than that. During compaction the database is not available and the ovsdb-server doesn't do any other tasks. If leader spends 10-15 seconds writing a snapshot, the cluster is not functional for that time period. Leader also, likely, has some monitors to serve, so the one poll interval may be 15-20 seconds long in the end. Systems with so big databases typically has very high election timers configured (16 seconds), so followers will start election only after this significant amount of time. Once leader is back to the operational state, it will re-connect and try to join the cluster back. In some cases, this might also trigger the 'connected' state flapping on the old leader triggering a re-connection of clients. This issue has been observed with large-scale OVN deployments. One of the methods to improve the situation is to transfer leadership before compacting. This allows to keep the cluster functional, while one of the servers writes a snapshot. Additionally logging the time spent for compaction if it was longer than 1 second. This adds a bit of visibility to 'unreasonably long poll interval's. Reported-at: https://bugzilla.redhat.com/1960391 Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>
Diffstat (limited to 'utilities')
0 files changed, 0 insertions, 0 deletions