ovsdb-server: Reclaim heap memory after compaction.

Compaction happens at most once in 10 minutes. That is a big time interval for a heavy loaded ovsdb-server in cluster mode. In 10 minutes raft logs could grow up to tens of thousands of entries with tens of gigabytes in total size. While compaction cleans up raft log entries, the memory in many cases is not returned to the system, but kept in the heap of running ovsdb-server process, and it could stay in this condition for a really long time. In the end one performance spike could lead to a fast growth of the raft log and this memory will never (for a really long time) be released to the system even if the database if empty. Simple example how to reproduce with OVN sandbox: 1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' 2. Run following script that creates 1 port group, adds 4000 acls and removes all of that in the end: # cat ../memory-test.sh pg_name=my_port_group export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off) ovn-nbctl pg-add $pg_name for i in $(seq 1 4000); do echo "Iteration: $i" ovn-nbctl --log acl-add $pg_name from-lport $i udp drop done ovn-nbctl acl-del $pg_name ovn-nbctl pg-del $pg_name ovs-appctl -t $(pwd)/sandbox/nb1 memory/show ovn-appctl -t ovn-nbctl exit --- 3. Stopping one of Northbound DB servers: ovs-appctl -t $(pwd)/sandbox/nb1 exit Make sure that ovsdb-server didn't compact the database before it was stopped. Now we have a db file on disk that contains 4000 fairly big transactions inside. 4. Trying to start same ovsdb-server with this file. # cd sandbox && ovsdb-server <...> nb1.db At this point ovsdb-server reads all the transactions from db file and performs all of them as fast as it can one by one. When it finishes this, raft log contains 4000 entries and ovsdb-server consumes (on my system) ~13GB of memory while database is empty. And libc will likely never return this memory back to system, or, at least, will hold it for a really long time. This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'. It's disabled by default, but once enabled, ovsdb-server will call 'malloc_trim(0)' after every successful compaction to try to return unused heap memory back to system. This is glibc-specific, so we need to detect function availability in a build time. Disabled by default since it adds from 1% to 30% (depending on the current state) to the snapshot creation time and, also, next memory allocations will likely require requests to kernel and that might be slower. Could be enabled by default later if considered broadly beneficial. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829 Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
author: Ilya Maximets <i.maximets@ovn.org> 2020-10-24 02:25:48 +0200
committer: Ilya Maximets <i.maximets@ovn.org> 2020-11-03 13:01:33 +0100
commit: f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a (patch)
tree: 9bfaaaeceb94e847edc8e005649654306e5de5de /configure.ac
parent: 7e38188160294df43dbbbc0cf6cfd42d02881fcf (diff)
download: openvswitch-f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a.tar.gz
1 files changed, 1 insertions, 0 deletions
diff --git a/configure.ac b/configure.ac
index 8d37af9db..126a1d9d1 100644
--- a/configure.ac
+++ b/configure.ac
@@ -100,6 +100,7 @@ OVS_CHECK_IF_DL
 OVS_CHECK_STRTOK_R
 OVS_CHECK_LINUX_AF_XDP
 AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
+AC_CHECK_DECLS([malloc_trim], [], [], [[#include <malloc.h>]])
 AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
   [], [], [[#include <sys/stat.h>]])
 AC_CHECK_MEMBERS([struct ifreq.ifr_flagshigh], [], [], [[#include <net/if.h>]])
author	Ilya Maximets <i.maximets@ovn.org>	2020-10-24 02:25:48 +0200
committer	Ilya Maximets <i.maximets@ovn.org>	2020-11-03 13:01:33 +0100
commit	f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a (patch)
tree	9bfaaaeceb94e847edc8e005649654306e5de5de /configure.ac
parent	7e38188160294df43dbbbc0cf6cfd42d02881fcf (diff)
download	openvswitch-f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a.tar.gz