Reclaim page cache of RDB file (#11248)

# Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
author: Tian <skylypig@gmail.com> 2023-02-12 15:23:29 +0800
committer: GitHub <noreply@github.com> 2023-02-12 09:23:29 +0200
commit: 7dae142a2ebf909a63df13e5813c073c79be521f (patch)
tree: 28df352d6c4711a669acf171d5a07319528474df /src/rio.c
parent: 5c3938d5cc08b42acc99f314d92f9e0d5671f96e (diff)
download: redis-7dae142a2ebf909a63df13e5813c073c79be521f.tar.gz
1 files changed, 20 insertions, 0 deletions
diff --git a/src/rio.c b/src/rio.c
index de4713fec..eaf88d25f 100644
--- a/src/rio.c
+++ b/src/rio.c
@@ -151,6 +151,16 @@ static size_t rioFileWrite(rio *r, const void *buf, size_t len) {
 #else
             if (redis_fsync(fileno(r->io.file.fp)) == -1) return 0;
 #endif
+            if (r->io.file.reclaim_cache) {
+                /* In Linux sync_file_range just issue a writeback request to
+                 * OS, and when posix_fadvise is called, the dirty page may
+                 * still be in flushing, which means it would be ignored by
+                 * posix_fadvise.
+                 * 
+                 * So we posix_fadvise the whole file, and the writeback-ed 
+                 * pages will have other chances to be reclaimed. */
+                reclaimFilePageCache(fileno(r->io.file.fp), 0, 0);
+            }
             r->io.file.buffered = 0;
         }
     }
@@ -191,6 +201,7 @@ void rioInitWithFile(rio *r, FILE *fp) {
     r->io.file.fp = fp;
     r->io.file.buffered = 0;
     r->io.file.autosync = 0;
+    r->io.file.reclaim_cache = 0;
 }
 
 /* ------------------- Connection implementation -------------------
@@ -439,6 +450,15 @@ void rioSetAutoSync(rio *r, off_t bytes) {
     r->io.file.autosync = bytes;
 }
 
+/* Set the file-based rio object to reclaim cache after every auto-sync.
+ * In the Linux implementation POSIX_FADV_DONTNEED skips the dirty
+ * pages, so if auto sync is unset this option will have no effect.
+ * 
+ * This feature can reduce the cache footprint backed by the file. */
+void rioSetReclaimCache(rio *r, int enabled) {
+    r->io.file.reclaim_cache = enabled;
+}
+
 /* Check the type of rio. */
 uint8_t rioCheckType(rio *r) {
     if (r->read == rioFileRead) {
author	Tian <skylypig@gmail.com>	2023-02-12 15:23:29 +0800
committer	GitHub <noreply@github.com>	2023-02-12 09:23:29 +0200
commit	7dae142a2ebf909a63df13e5813c073c79be521f (patch)
tree	28df352d6c4711a669acf171d5a07319528474df /src/rio.c
parent	5c3938d5cc08b42acc99f314d92f9e0d5671f96e (diff)
download	redis-7dae142a2ebf909a63df13e5813c073c79be521f.tar.gz