diff options
author | Ilya Maximets <i.maximets@ovn.org> | 2021-12-13 16:43:33 +0100 |
---|---|---|
committer | Ilya Maximets <i.maximets@ovn.org> | 2021-12-13 21:54:45 +0100 |
commit | 339f97044e3c2312fbb65b932fa14a181acf40d5 (patch) | |
tree | f0bfc634040c6b5a988823b9b9b963758d4751eb /ovsdb | |
parent | bf07cc9cdb2f37fede8c0363937f1eb9f4cfd730 (diff) | |
download | openvswitch-339f97044e3c2312fbb65b932fa14a181acf40d5.tar.gz |
ovsdb: storage: Randomize should_snapshot checks when the minimum time passed.
Snapshots are scheduled for every 10-20 minutes. It's a random value
in this interval for each server. Once the time is up, but the maximum
time (24 hours) not reached yet, ovsdb will start checking if the log
grew a lot on every iteration. Once the growth is detected, compaction
is triggered.
OTOH, it's very common for an OVSDB cluster to not have the log growing
very fast. If the log didn't grow 2x in 20 minutes, the randomness of
the initial scheduled time is gone and all the servers are checking if
they need to create snapshot on every iteration. And since all of them
are part of the same cluster, their logs are growing with the same
speed. Once the critical mass is reached, all the servers will start
creating snapshots at the same time. If the database is big enough,
that might leave the cluster unresponsive for an extended period of
time (e.g. 10-15 seconds for OVN_Southbound database in a larger scale
OVN deployment) until the compaction completed.
Fix that by re-scheduling a quick retry if the minimal time already
passed. Effectively, this will work as a randomized 1-2 min delay
between checks, so the servers will not synchronize.
Scheduling function updated to not change the upper limit on quick
reschedules to avoid delaying the snapshot creation indefinitely.
Currently quick re-schedules are only used for the error cases, and
there is always a 'slow' re-schedule after the successful compaction.
So, the change of a scheduling function doesn't change the current
behavior much.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Diffstat (limited to 'ovsdb')
-rw-r--r-- | ovsdb/storage.c | 17 | ||||
-rw-r--r-- | ovsdb/storage.h | 2 |
2 files changed, 16 insertions, 3 deletions
diff --git a/ovsdb/storage.c b/ovsdb/storage.c index 9e32efe58..d4984be25 100644 --- a/ovsdb/storage.c +++ b/ovsdb/storage.c @@ -507,7 +507,11 @@ schedule_next_snapshot(struct ovsdb_storage *storage, bool quick) long long int now = time_msec(); storage->next_snapshot_min = now + base + random_range(range); - storage->next_snapshot_max = now + 60LL * 60 * 24 * 1000; /* 1 day */ + if (!quick) { + long long int one_day = 60LL * 60 * 24 * 1000; + + storage->next_snapshot_max = now + one_day; + } } else { storage->next_snapshot_min = LLONG_MAX; storage->next_snapshot_max = LLONG_MAX; @@ -515,7 +519,7 @@ schedule_next_snapshot(struct ovsdb_storage *storage, bool quick) } bool -ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage) +ovsdb_storage_should_snapshot(struct ovsdb_storage *storage) { if (storage->raft || storage->log) { /* If we haven't reached the minimum snapshot time, don't snapshot. */ @@ -544,6 +548,15 @@ ovsdb_storage_should_snapshot(const struct ovsdb_storage *storage) } if (!snapshot_recommended) { + if (storage->raft) { + /* Re-scheduling with a quick retry in order to avoid condition + * where all the raft servers passed the minimal time already, + * but the log didn't grow a lot, so they are all checking on + * every iteration. This will randomize the time of the next + * attempt, so all the servers will not start snapshotting at + * the same time when the log reaches a critical size. */ + schedule_next_snapshot(storage, true); + } return false; } diff --git a/ovsdb/storage.h b/ovsdb/storage.h index e120094d7..ff026b77f 100644 --- a/ovsdb/storage.h +++ b/ovsdb/storage.h @@ -76,7 +76,7 @@ uint64_t ovsdb_write_get_commit_index(const struct ovsdb_write *); void ovsdb_write_wait(const struct ovsdb_write *); void ovsdb_write_destroy(struct ovsdb_write *); -bool ovsdb_storage_should_snapshot(const struct ovsdb_storage *); +bool ovsdb_storage_should_snapshot(struct ovsdb_storage *); struct ovsdb_error *ovsdb_storage_store_snapshot(struct ovsdb_storage *storage, const struct json *schema, const struct json *snapshot) |