From 748010ff304b7cd2c43f4eb98a554433f0df07f9 Mon Sep 17 00:00:00 2001 From: Ilya Maximets Date: Tue, 24 Aug 2021 23:07:22 +0200 Subject: json: Optimize string serialization. Current string serialization code puts all characters one by one. This is slow because dynamic string needs to perform length checks on every ds_put_char() and it's also doesn't allow compiler to use better memory copy operations, i.e. doesn't allow copying few bytes at once. Special symbols are rare in a typical database. Quotes are frequent, but not too frequent. In databases created by ovn-kubernetes, for example, usually there are at least 10 to 50 chars between quotes. So, it's better to count characters that doesn't require escaping and use fast data copy for the whole sequential block. Testing with a synthetic benchmark (included) on my laptop shows following performance improvement: Size Q S Before After Diff ----------------------------------------------------- 100000 0 0 : 0.227 ms 0.142 ms -37.4 % 100000 2 1 : 0.277 ms 0.186 ms -32.8 % 100000 10 1 : 0.361 ms 0.309 ms -14.4 % 10000000 0 0 : 22.720 ms 12.160 ms -46.4 % 10000000 2 1 : 27.470 ms 19.300 ms -29.7 % 10000000 10 1 : 37.950 ms 31.250 ms -17.6 % 100000000 0 0 : 239.600 ms 126.700 ms -47.1 % 100000000 2 1 : 292.400 ms 188.600 ms -35.4 % 100000000 10 1 : 387.700 ms 321.200 ms -17.1 % Here Q - probability (%) for a character to be a '\"' and S - probability (%) to be a special character ( < 32). Testing with a closer to real world scenario shows overall decrease of the time needed for database compaction by ~5-10 %. And this change also decreases CPU consumption in general, because string serialization is used in many different places including ovsdb monitors and raft. Signed-off-by: Ilya Maximets Acked-by: Numan Siddique Acked-by: Dumitru Ceara --- lib/json.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'lib') diff --git a/lib/json.c b/lib/json.c index 32d25003b..bd524d66d 100644 --- a/lib/json.c +++ b/lib/json.c @@ -1696,14 +1696,30 @@ json_serialize_string(const char *string, struct ds *ds) { uint8_t c; uint8_t c2; + size_t count; const char *escape; + const char *start; ds_put_char(ds, '"'); + count = 0; + start = string; while ((c = *string++) != '\0') { - escape = chars_escaping[c]; - while ((c2 = *escape++) != '\0') { - ds_put_char(ds, c2); + if (c >= ' ' && c != '"' && c != '\\') { + count++; + } else { + if (count) { + ds_put_buffer(ds, start, count); + count = 0; + } + start = string; + escape = chars_escaping[c]; + while ((c2 = *escape++) != '\0') { + ds_put_char(ds, c2); + } } } + if (count) { + ds_put_buffer(ds, start, count); + } ds_put_char(ds, '"'); } -- cgit v1.2.1