diff options
author | Ilya Maximets <i.maximets@ovn.org> | 2020-10-24 02:25:48 +0200 |
---|---|---|
committer | Ilya Maximets <i.maximets@ovn.org> | 2020-11-03 13:01:33 +0100 |
commit | f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a (patch) | |
tree | 9bfaaaeceb94e847edc8e005649654306e5de5de /configure.ac | |
parent | 7e38188160294df43dbbbc0cf6cfd42d02881fcf (diff) | |
download | openvswitch-f38f98a2c0dd7fcaf20fbe11d1e67a9b2afc0b2a.tar.gz |
ovsdb-server: Reclaim heap memory after compaction.
Compaction happens at most once in 10 minutes. That is a big time
interval for a heavy loaded ovsdb-server in cluster mode.
In 10 minutes raft logs could grow up to tens of thousands of entries
with tens of gigabytes in total size.
While compaction cleans up raft log entries, the memory in many cases
is not returned to the system, but kept in the heap of running
ovsdb-server process, and it could stay in this condition for a really
long time. In the end one performance spike could lead to a fast
growth of the raft log and this memory will never (for a really long
time) be released to the system even if the database if empty.
Simple example how to reproduce with OVN sandbox:
1. make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'
2. Run following script that creates 1 port group, adds 4000 acls and
removes all of that in the end:
# cat ../memory-test.sh
pg_name=my_port_group
export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off)
ovn-nbctl pg-add $pg_name
for i in $(seq 1 4000); do
echo "Iteration: $i"
ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
done
ovn-nbctl acl-del $pg_name
ovn-nbctl pg-del $pg_name
ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
ovn-appctl -t ovn-nbctl exit
---
3. Stopping one of Northbound DB servers:
ovs-appctl -t $(pwd)/sandbox/nb1 exit
Make sure that ovsdb-server didn't compact the database before
it was stopped. Now we have a db file on disk that contains
4000 fairly big transactions inside.
4. Trying to start same ovsdb-server with this file.
# cd sandbox && ovsdb-server <...> nb1.db
At this point ovsdb-server reads all the transactions from db
file and performs all of them as fast as it can one by one.
When it finishes this, raft log contains 4000 entries and
ovsdb-server consumes (on my system) ~13GB of memory while
database is empty. And libc will likely never return this memory
back to system, or, at least, will hold it for a really long time.
This patch adds a new command 'ovsdb-server/memory-trim-on-compaction'.
It's disabled by default, but once enabled, ovsdb-server will call
'malloc_trim(0)' after every successful compaction to try to return
unused heap memory back to system. This is glibc-specific, so we
need to detect function availability in a build time.
Disabled by default since it adds from 1% to 30% (depending on the
current state) to the snapshot creation time and, also, next memory
allocations will likely require requests to kernel and that might be
slower. Could be enabled by default later if considered broadly
beneficial.
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1888829
Acked-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Diffstat (limited to 'configure.ac')
-rw-r--r-- | configure.ac | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/configure.ac b/configure.ac index 8d37af9db..126a1d9d1 100644 --- a/configure.ac +++ b/configure.ac @@ -100,6 +100,7 @@ OVS_CHECK_IF_DL OVS_CHECK_STRTOK_R OVS_CHECK_LINUX_AF_XDP AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]]) +AC_CHECK_DECLS([malloc_trim], [], [], [[#include <malloc.h>]]) AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec], [], [], [[#include <sys/stat.h>]]) AC_CHECK_MEMBERS([struct ifreq.ifr_flagshigh], [], [], [[#include <net/if.h>]]) |