summaryrefslogtreecommitdiff
path: root/src/bio.c
diff options
context:
space:
mode:
authorTian <skylypig@gmail.com>2023-02-12 15:23:29 +0800
committerGitHub <noreply@github.com>2023-02-12 09:23:29 +0200
commit7dae142a2ebf909a63df13e5813c073c79be521f (patch)
tree28df352d6c4711a669acf171d5a07319528474df /src/bio.c
parent5c3938d5cc08b42acc99f314d92f9e0d5671f96e (diff)
downloadredis-7dae142a2ebf909a63df13e5813c073c79be521f.tar.gz
Reclaim page cache of RDB file (#11248)
# Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
Diffstat (limited to 'src/bio.c')
-rw-r--r--src/bio.c10
1 files changed, 9 insertions, 1 deletions
diff --git a/src/bio.c b/src/bio.c
index b8c73d528..7eb43a3a3 100644
--- a/src/bio.c
+++ b/src/bio.c
@@ -74,6 +74,8 @@ typedef union bio_job {
int fd; /* Fd for file based background jobs */
unsigned need_fsync:1; /* A flag to indicate that a fsync is required before
* the file is closed. */
+ unsigned need_reclaim_cache:1; /* A flag to indicate that reclaim cache is required before
+ * the file is closed. */
} fd_args;
struct {
@@ -144,10 +146,11 @@ void bioCreateLazyFreeJob(lazy_free_fn free_fn, int arg_count, ...) {
bioSubmitJob(BIO_LAZY_FREE, job);
}
-void bioCreateCloseJob(int fd, int need_fsync) {
+void bioCreateCloseJob(int fd, int need_fsync, int need_reclaim_cache) {
bio_job *job = zmalloc(sizeof(*job));
job->fd_args.fd = fd;
job->fd_args.need_fsync = need_fsync;
+ job->fd_args.need_reclaim_cache = need_reclaim_cache;
bioSubmitJob(BIO_CLOSE_FILE, job);
}
@@ -216,6 +219,11 @@ void *bioProcessBackgroundJobs(void *arg) {
if (job->fd_args.need_fsync) {
redis_fsync(job->fd_args.fd);
}
+ if (job->fd_args.need_reclaim_cache) {
+ if (reclaimFilePageCache(job->fd_args.fd, 0, 0) == -1) {
+ serverLog(LL_NOTICE,"Unable to reclaim page cache: %s", strerror(errno));
+ }
+ }
close(job->fd_args.fd);
} else if (type == BIO_AOF_FSYNC) {
/* The fd may be closed by main thread and reused for another