delta/redis.git - github.com: antirez/redis.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reclaim page cache of RDB file (#11248)	Tian	2023-02-12	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	# Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
*	optimizing d2string() and addReplyDouble() with grisu2: double to string ↵	filipe oliveira	2022-10-15	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conversion based on Florian Loitsch's Grisu-algorithm (#10587) All commands / use cases that heavily rely on double to a string representation conversion, (e.g. meaning take a double-precision floating-point number like 1.5 and return a string like "1.5" ), could benefit from a performance boost by swapping snprintf(buf,len,"%.17g",value) by the equivalent [fpconv_dtoa](https://github.com/night-shift/fpconv) or any other algorithm that ensures 100% coverage of conversion. This is a well-studied topic and Projects like MongoDB. RedPanda, PyTorch leverage libraries ( fmtlib ) that use the optimized double to string conversion underneath. The positive impact can be substantial. This PR uses the grisu2 approach ( grisu explained on https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf section 5 ). test suite changes: Despite being compatible, in some cases it produces a different result from printf, and some tests had to be adjusted. one case is that `%.17g` (which means %e or %f which ever is shorter), chose to use `5000000000` instead of 5e+9, which sounds like a bug? In other cases, we changed TCL to compare numbers instead of strings to ignore minor rounding issues (`expr 0.8 == 0.79999999999999999`)
*	Adds isolated netstats for replication. (#10062)	DarrenJiang13	2022-05-31	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The amount of `server.stat_net_output_bytes/server.stat_net_input_bytes` is actually the sum of replication flow and users' data flow. It may cause confusions like this: "Why does my server get such a large output_bytes while I am doing nothing? ". After discussions and revisions, now here is the change about what this PR brings (final version before merge): - 2 server variables to count the network bytes during replication, including fullsync and propagate bytes. - `server.stat_net_repl_output_bytes`/`server.stat_net_repl_input_bytes` - 3 info fields to print the input and output of repl bytes and instantaneous value of total repl bytes. - `total_net_repl_input_bytes` / `total_net_repl_output_bytes` - `instantaneous_repl_total_kbps` - 1 new API `rioCheckType()` to check the type of rio. So we can use this to distinguish between diskless and diskbased replication - 2 new counting items to keep network statistics consistent between master and slave - rdb portion during diskless replica. in `rdbLoadProgressCallback()` - first line of the full sync payload. in `readSyncBulkPayload()` Co-authored-by: Oran Agra <oran@redislabs.com>
*	Fix when the master connection is disconnected, replication retry read ↵	sundb	2021-12-31	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	indefinitely (#10032) Now if redis is still loading when we receive sigterm, we will wait for the loading to reach the event loop (once in 2mb) before actually shutting down. See #10003. This change caused valgrind CI to fail. See https://github.com/redis/redis/runs/4662901673?check_suite_focus=true This pr is mainly to solve the problem that redis process cannot be exited normally. When the master is disconnected, if repl is processing diskless loading and using `connRead` to read data from master, it may enter an infinite retry state, which does not handle `connRead` returning 0(master connection disconnected).
*	Retry when a blocked connection system call is interrupted by a signal (#9629)	menwen	2021-11-04	1	-0/+2
\| \| \| \| \| \| \|	When repl-diskless-load is enabled, the connection is set to the blocking state. The connection may be interrupted by a signal during a system call. This would have resulted in a disconnection and possibly a reconnection loop. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Use sync_file_range to optimize fsync if possible (#9409)	Wang Yuan	2021-08-30	1	-10/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We implement incremental data sync in rio.c by call fsync, on slow disk, that may cost a lot of time, sync_file_range could provide async fsync, so we could serialize key/value and sync file data at the same time. > one tip for sync_file_range usage: http://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html Additionally, this change avoids a single large write to be used, which can result in a mass of dirty pages in the kernel (increasing the risk of someone else's write to block). On HDD, current solution could reduce approximate half of dumping RDB time, this PR costs 50s for dump 7.7G rdb but unstable branch costs 93s. On NVME SSD, this PR can't reduce much time, this PR costs 40s, unstable branch costs 48s. Moreover, I find calling data sync every 4MB is better than 32MB.
*	Minor refactoring for rioConnRead and adding errno (#9280)	Ewg-c	2021-07-29	1	-9/+9
\| \| \|	minor refactoring for rioConnRead and adding errno
*	Fixed some typos, add a spell check ci and others minor fix (#8890)	Binbin	2021-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds a spell checker CI action that will fail future PRs if they introduce typos and spelling mistakes. This spell checker is based on blacklist of common spelling mistakes, so it will not catch everything, but at least it is also unlikely to cause false positives. Besides that, the PR also fixes many spelling mistakes and types, not all are a result of the spell checker we use. Here's a summary of other changes: 1. Scanned the entire source code and fixes all sorts of typos and spelling mistakes (including missing or extra spaces). 2. Outdated function / variable / argument names in comments 3. Fix outdated keyspace masks error log when we check `config.notify-keyspace-events` in loadServerConfigFromString. 4. Trim the white space at the end of line in `module.c`. Check: https://github.com/redis/redis/pull/7751 5. Some outdated https link URLs. 6. Fix some outdated comment. Such as: - In README: about the rdb, we used to said create a `thread`, change to `process` - dbRandomKey function coment (about the dictGetRandomKey, change to dictGetFairRandomKey) - notifyKeyspaceEvent fucntion comment (add type arg) - Some others minor fix in comment (Most of them are incorrectly quoted by variable names) 7. Modified the error log so that users can easily distinguish between TCP and TLS in `changeBindAddr`
*	Handle remaining fsync errors (#8419)	Wang Yuan	2021-04-01	1	-1/+1
\| \| \| \| \| \| \| \|	In `aof.c`, we call fsync when stop aof, and now print a log to let user know that if fail. In `cluster.c`, we now return error, the calling function already handles these write errors. In `redis-cli.c`, users hope to save rdb, we now print a message if fsync failed. In `rio.c`, we now treat fsync errors like we do for write errors. In `server.c`, we try to fsync aof file when shutdown redis, we only can print one log if fail. In `bio.c`, if failing to fsync aof file, we will set `aof_bio_fsync_status` to error , and reject writing just like last writing aof error, moreover also set INFO command field `aof_last_write_status` to error.
*	Fix typo and outdated comments. (#8640)	Huang Zhw	2021-03-14	1	-2/+2
\|
*	more strict check in rioConnRead (#7564)	zhaozhao.zz	2020-07-24	1	-1/+1
\|
*	Fix harmless bug in rioConnRead (#7557)	Oran Agra	2020-07-23	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	this code is in use only if the master is disk-based, and the replica is diskless. In this case we use a buffered reader, but we must avoid reading past the rdb file, into the command stream. which Luckly rdb.c doesn't really attempt to do (it knows how much it should read). When rioConnRead detects that the extra buffering attempt reaches beyond the read limit it should read less, but if the caller actually requested more, then it should return with an error rather than a short read. the bug would have resulted in short read. in order to fix it, the code must consider the real requested size, and not the extra buffering size.
*	diskless replication rdb transfer uses pipe, and writes to sockets form the ↵	Oran Agra	2019-10-07	1	-84/+58
\| \| \| \| \| \| \| \| \| \| \| \| \|	parent process. misc: - handle SSL_has_pending by iterating though these in beforeSleep, and setting timeout of 0 to aeProcessEvents - fix issue with epoll signaling EPOLLHUP and EPOLLERR only to the write handlers. (needed to detect the rdb pipe was closed) - add key-load-delay config for testing - trim connShutdown which is no longer needed - rioFdsetWrite -> rioFdWrite - simplified since there's no longer need to write to multiple FDs - don't detect rdb child exited (don't call wait3) until we detect the pipe is closed - Cleanup bad optimization from rio.c, add another one
*	TLS: Connections refactoring and TLS support.	Yossi Gottlieb	2019-10-07	1	-72/+72
\| \| \| \| \| \| \| \|	* Introduce a connection abstraction layer for all socket operations and integrate it across the code base. * Provide an optional TLS connections implementation based on OpenSSL. * Pull a newer version of hiredis with TLS support. * Tests, redis-cli updates for TLS support.
*	Rio: remember read/write error conditions.	antirez	2019-07-17	1	-0/+4
\|
*	Diskless replica: fix mispelled var name.	antirez	2019-07-10	1	-1/+1
\|
*	Diskless replica: a few aesthetic changes to rio.c	antirez	2019-07-08	1	-25/+32
\|
*	Diskless replica: a few aesthetic changes to replication.c.	antirez	2019-07-08	1	-5/+7
\|
*	diskless replication on slave side (don't store rdb to file), plus some ↵	Oran Agra	2019-07-08	1	-1/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	other related fixes The implementation of the diskless replication was currently diskless only on the master side. The slave side was still storing the received rdb file to the disk before loading it back in and parsing it. This commit adds two modes to load rdb directly from socket: 1) when-empty 2) using "swapdb" the third mode of using diskless slave by flushdb is risky and currently not included. other changes: -------------- distinguish between aof configuration and state so that we can re-enable aof only when sync eventually succeeds (and not when exiting from readSyncBulkPayload after a failed attempt) also a CONFIG GET and INFO during rdb loading would have lied When loading rdb from the network, don't kill the server on short read (that can be a network error) Fix rdb check when performed on preamble AOF tests: run replication tests for diskless slave too make replication test a bit more aggressive Add test for diskless load swapdb
*	rdb: incremental fsync when redis saves rdb	zhaozhao.zz	2018-03-16	1	-1/+1
\|
*	fix processing of large bulks (above 2GB)	Oran Agra	2017-12-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	- protocol parsing (processMultibulkBuffer) was limitted to 32big positions in the buffer readQueryFromClient potential overflow - rioWriteBulkCount used int, although rioWriteBulkString gave it size_t - several places in sds.c that used int for string length or index. - bugfix in RM_SaveAuxField (return was 1 or -1 and not length) - RM_SaveStringBuffer was limitted to 32bit length
*	various cleanups and minor fixes	Oran Agra	2016-04-25	1	-2/+3
\|
*	RDMF: More consistent define names.	antirez	2015-07-27	1	-5/+5
\|
*	RDMF: redisAssert -> serverAssert.	antirez	2015-07-26	1	-1/+1
\|
*	RDMF (Redis/Disque merge friendlyness) refactoring WIP 1.	antirez	2015-07-26	1	-1/+1
\|
*	Translate rio fdset target EWOULDBLOCK error into ETIMEDOUT.	antirez	2014-10-22	1	-1/+8
\| \| \| \| \| \| \|	EWOULDBLOCK with the fdset rio target is returned when we try to write but the send timeout socket option triggered an error. Better to translate the error in something the user can actually recognize as a timeout.
*	rio.c fdset write() method fixed: wrong type for return value.	antirez	2014-10-17	1	-1/+1
\|
*	rio fdset target: handle short writes.	antirez	2014-10-17	1	-2/+11
\| \| \| \| \|	While the socket is set in blocking mode, we still can get short writes writing to a socket.
*	Diskless replication: rio fdset target new supports buffering.	antirez	2014-10-17	1	-1/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	To perform a socket write() for each RDB rio API write call was extremely unefficient, so now rio has minimal buffering capabilities. Writes are accumulated into a buffer and only when a given limit is reacehd are actually wrote to the N slaves FDs. Trivia: rio lacked support for buffering since our targets were: 1) Memory buffers. 2) C standard I/O. Both were buffered already.
*	rio.c fdset target: tolerate (and report) a subset of FDs in error.	antirez	2014-10-14	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \|	Fdset target is used when we want to write an RDB file directly to slave's sockets. In this setup as long as there is a single slave that is still receiving our payload, we want to continue sennding instead of aborting. However rio calls should abort of no FD is ok. Also we want the errors reported so that we can signal the parent who is ok and who is broken, so there is a new set integers with the state of each fd. Zero is ok, non-zero is the errno of the failure, if avaialble, or a generic EIO.
*	rio.c: draft implementation of fdset target implemented.	antirez	2014-10-10	1	-0/+60
\|
*	rio.c refactoring before adding a new target.	antirez	2014-10-10	1	-17/+24
\|
*	Use fflush() before fsync() in rio.c.	antirez	2014-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Incremental flushing in rio.c is only used to avoid huge kernel buffers synched to slow disks creating big latency spikes, so this fix has no durability implications, however it is certainly more correct to make sure that the FILE buffers are flushed to the kernel before calling fsync on the file descriptor. Thanks to Li Shao Kai for reporting this issue in the Redis mailing list.
*	Chunked loading of RDB to prevent redis from stalling reading very large keys.	yoav	2013-07-16	1	-0/+4
\|
*	rio.c: added ability to fdatasync() from time to time while writing.	antirez	2013-04-24	1	-1/+30
\|
*	Introduced the Build ID in INFO and --version output.	antirez	2012-11-29	1	-2/+1
\| \| \| \| \| \| \|	The idea is to be able to identify a build in a unique way, so for instance after a bug report we can recognize that the build is the one of a popular Linux distribution and perform the debugging in the same environment.
*	BSD license added to every C source and header file.	antirez	2012-11-08	1	-3/+36
\|
*	Fixed compilation of new rio.c changes (typos and so forth.)	antirez	2012-04-09	1	-1/+3
\|
*	Add checksum computation to rio.c	antirez	2012-04-09	1	-0/+10
\|
*	rio.c file somewhat documented so that the casual reader can understand ↵	antirez	2012-04-09	1	-0/+18
\| \| \| \|	what's going on without reading the code.
*	Fixed a few warnings compiling on Linux.	antirez	2011-10-23	1	-0/+2
\|
*	rioInitWithFile nad rioInitWithBuffer functions now take a rio structure ↵	antirez	2011-09-22	1	-9/+8
\| \| \| \|	pointer to avoid copying a structure to return value to the caller.
*	make sure to return just 1 for rio.c write when the target is a buffer, as ↵	antirez	2011-09-22	1	-2/+2
\| \| \| \|	we do when the target is a file.
*	Abstract file/buffer I/O to support in-memory serialization	Pieter Noordhuis	2011-05-13	1	-0/+106