ovsdb: Use column diffs for ovsdb and raft log entries.

Currently, ovsdb-server stores complete value for the column in a database file and in a raft log in case this column changed. This means that transaction that adds, for example, one new acl to a port group creates a log entry with all UUIDs of all existing acls + one new. Same for ports in logical switches and routers and more other columns with sets in Northbound DB. There could be thousands of acls in one port group or thousands of ports in a single logical switch. And the typical use case is to add one new if we're starting a new service/VM/container or adding one new node in a kubernetes or OpenStack cluster. This generates huge amount of traffic within ovsdb raft cluster, grows overall memory consumption and hurts performance since all these UUIDs are parsed and formatted to/from json several times and stored on disks. And more values we have in a set - more space a single log entry will occupy and more time it will take to process by ovsdb-server cluster members. Simple test: 1. Start OVN sandbox with clustered DBs: # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered' 2. Run a script that creates one port group and adds 4000 acls into it: # cat ../memory-test.sh pg_name=my_port_group export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file -vsocket_util:off) ovn-nbctl pg-add $pg_name for i in $(seq 1 4000); do echo "Iteration: $i" ovn-nbctl --log acl-add $pg_name from-lport $i udp drop done ovn-nbctl acl-del $pg_name ovn-nbctl pg-del $pg_name ovs-appctl -t $(pwd)/sandbox/nb1 memory/show ovn-appctl -t ovn-nbctl exit --- 4. Check the current memory consumption of ovsdb-server processes and space occupied by database files: # ls sandbox/[ns]b*.db -alh # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid' Test results with current ovsdb log format: On-disk Nb DB size : ~369 MB RSS of Nb ovsdb-servers: ~2.7 GB Time to finish the test: ~2m In order to mitigate memory consumption issues and reduce computational load on ovsdb-servers let's store diff between old and new values instead. This will make size of each log entry that adds single acl to port group (or port to logical switch or anything else like that) very small and independent from the number of already existing acls (ports, etc.). Added a new marker '_is_diff' into a file transaction to specify that this transaction contains diffs instead of replacements for the existing data. One side effect is that this change will actually increase the size of file transaction that removes more than a half of entries from the set, because diff will be larger than the resulted new value. However, such operations are rare. Test results with change applied: On-disk Nb DB size : ~2.7 MB ---> reduced by 99% RSS of Nb ovsdb-servers: ~580 MB ---> reduced by 78% Time to finish the test: ~1m27s ---> reduced by 27% After this change new ovsdb-server is still able to read old databases, but old ovsdb-server will not be able to read new ones. Since new servers could join ovsdb cluster dynamically it's hard to implement any runtime mechanism to handle cases where different versions of ovsdb-server joins the cluster. However we still need to handle cluster upgrades. For this case added special command line argument to disable new functionality. Documentation updated with the recommended way to upgrade the ovsdb cluster. Acked-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
author: Ilya Maximets <i.maximets@ovn.org> 2020-12-11 21:54:47 +0100
committer: Ilya Maximets <i.maximets@ovn.org> 2021-01-15 19:23:02 +0100
commit: 2ccd66f594f7a5fdc39028f8c7473e11d2329a11 (patch)
tree: 4226fe03c9c33033f65fa083071ede5f8428fefa /ovsdb/ovsdb-tool.c
parent: 980bca70799da3d186c568f26b72a9774043d6ef (diff)
download: openvswitch-2ccd66f594f7a5fdc39028f8c7473e11d2329a11.tar.gz
1 files changed, 7 insertions, 1 deletions
diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index 1b49b6fc8..b8560f850 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -391,6 +391,9 @@ compact_or_convert(const char *src_name_, const char *dst_name_,
         ovs_fatal(retval, "%s: failed to lock lockfile", dst_name);
     }
 
+    /* Resulted DB will contain a single transaction without diff anyway. */
+    ovsdb_file_column_diff_disable();
+
     /* Save a copy. */
     struct ovsdb *ovsdb = (new_schema
                            ? ovsdb_file_read_as_schema(src_name, new_schema)
@@ -648,6 +651,8 @@ static void
 print_db_changes(struct shash *tables, struct smap *names,
                  const struct ovsdb_schema *schema)
 {
+    struct json *is_diff = shash_find_data(tables, "_is_diff");
+    bool diff = (is_diff && is_diff->type == JSON_TRUE);
     struct shash_node *n1;
 
     int i = 0;
@@ -691,7 +696,8 @@ print_db_changes(struct shash *tables, struct smap *names,
                     printf(" insert row %.8s:\n", row_uuid);
                 }
             } else {
-                printf(" row %s (%.8s):\n", old_name, row_uuid);
+                printf(" row %s (%.8s)%s:\n", old_name, row_uuid,
+                                              diff ? " diff" : "");
             }
 
             if (columns->type == JSON_OBJECT) {
author	Ilya Maximets <i.maximets@ovn.org>	2020-12-11 21:54:47 +0100
committer	Ilya Maximets <i.maximets@ovn.org>	2021-01-15 19:23:02 +0100
commit	2ccd66f594f7a5fdc39028f8c7473e11d2329a11 (patch)
tree	4226fe03c9c33033f65fa083071ede5f8428fefa /ovsdb/ovsdb-tool.c
parent	980bca70799da3d186c568f26b72a9774043d6ef (diff)
download	openvswitch-2ccd66f594f7a5fdc39028f8c7473e11d2329a11.tar.gz