summaryrefslogtreecommitdiff
path: root/src/aof.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove prototypes with empty declarations (#12020)Madelyn Olson2023-05-021-4/+4
| | | Technically declaring a prototype with an empty declaration has been deprecated since the early days of C, but we never got a warning for it. C2x will apparently be introducing a breaking change if you are using this type of declarator, so Clang 15 has started issuing a warning with -pedantic. Although not apparently a problem for any of the compiler we build on, if feels like the right thing is to properly adhere to the C standard and use (void).
* Fix fork done handler wrongly update fsync metrics and enhance AOF_ ↵Binbin2023-03-291-12/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | FSYNC_ALWAYS (#11973) This PR fix several unrelated bugs that were discovered by the same set of tests (WAITAOF tests in #11713), could make the `WAITAOF` test hang. The change in `backgroundRewriteDoneHandler` is about MP-AOF. That leftover / old code assumes that we started a new AOF file just now (when we have a new base into which we're gonna incrementally write), but the fact is that with MP-AOF, the fork done handler doesn't really affect the incremental file being maintained by the parent process, there's no reason to re-issue `SELECT`, and no reason to update any of the fsync variables in that flow. This should have been deleted with MP-AOF (introduced in #9788, 7.0). The damage is that the update to `aof_fsync_offset` will cause us to miss an fsync in `flushAppendOnlyFile`, that happens if we stop write commands in `AOF_FSYNC_EVERYSEC` while an AOFRW is in progress. This caused a new `WAITAOF` test to sometime hang forever. Also because of MP-AOF, we needed to change `aof_fsync_offset` to `aof_last_incr_fsync_offset` and match it to `aof_last_incr_size` in `flushAppendOnlyFile`. This is because in the past we compared `aof_fsync_offset` and `aof_current_size`, but with MP-AOF it could be the total AOF file will be smaller after AOFRW, and the (already existing) incr file still has data that needs to be fsynced. The change in `flushAppendOnlyFile`, about the `AOF_FSYNC_ALWAYS`, it is follow #6053 (the details is in #5985), we also check `AOF_FSYNC_ALWAYS` to handle a case where appendfsync is changed from everysec to always while there is data that's written but not yet fsynced.
* Implementing the WAITAOF command (issue #10505) (#11713)Slava Koyfman2023-03-141-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implementing the WAITAOF functionality which would allow the user to block until a specified number of Redises have fsynced all previous write commands to the AOF. Syntax: `WAITAOF <num_local> <num_replicas> <timeout>` Response: Array containing two elements: num_local, num_replicas num_local is always either 0 or 1 representing the local AOF on the master. num_replicas is the number of replicas that acknowledged the a replication offset of the last write being fsynced to the AOF. Returns an error when called on replicas, or when called with non-zero num_local on a master with AOF disabled, in all other cases the response just contains number of fsync copies. Main changes: * Added code to keep track of replication offsets that are confirmed to have been fsynced to disk. * Keep advancing master_repl_offset even when replication is disabled (and there's no replication backlog, only if there's an AOF enabled). This way we can use this command and it's mechanisms even when replication is disabled. * Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK will be appended only if there's an AOF on the replica, and already ignored on old masters (thus backwards compatible) * WAIT now no longer wait for the replication offset after your last command, but rather the replication offset after your last write (or read command that caused propagation, e.g. lazy expiry). Unrelated changes: * WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI) Implementation details: * Add an atomic var named `fsynced_reploff_pending` that's updated (usually by the bio thread) and later copied to the main `fsynced_reploff` variable (only if the AOF base file exists). I.e. during the initial AOF rewrite it will not be used as the fsynced offset since the AOF base is still missing. * Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific) job that will also update fsync offset the field. * Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio worker thread, to impose ordering on their execution. This solves a race condition where a job could set `fsynced_reploff_pending` to a higher value than another pending fsync job, resulting in indicating an offset for which parts of the data have not yet actually been fsynced. Imposing an ordering on the jobs guarantees that fsync jobs are executed in increasing order of replication offset. * Drain bio jobs when switching `appendfsync` to "always" This should prevent a write race between updates to `fsynced_reploff_pending` in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and those done in the bio thread. * Drain the pending fsync when starting over a new AOF to avoid race conditions with the previous AOF offsets overriding the new one (e.g. after switching to replicate from a new master). * Make sure to update the fsynced offset at the end of the initial AOF rewrite. a must in case there are no additional writes that trigger a periodic fsync, specifically for a replica that does a full sync. Limitations: It is possible to write a module and a Lua script that propagate to the AOF and doesn't propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand. These features are incompatible with the WAITAOF command, and can result in two bad cases. The scenario is that the user executes command that only propagates to AOF, and then immediately issues a WAITAOF, and there's no further writes on the replication stream after that. 1. if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs). 2. if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change) Refactoring: * Plumbing to allow bio worker to handle multiple job types This introduces infrastructure necessary to allow BIO workers to not have a 1-1 mapping of worker to job-type. This allows in the future to assign multiple job types to a single worker, either as a performance/resource optimization, or as a way of enforcing ordering between specific classes of jobs. Co-authored-by: Oran Agra <oran@redislabs.com>
* Demoting some of the non-warning messages to notice (#10715)Binbin2023-02-191-4/+4
| | | | | | | | | | | | | | | | | We have cases where we print information (might be important but by no means an error indicator) with the LL_WARNING level. Demoting these to LL_NOTICE: - oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo - User requested shutdown... This is also true for cases that we encounter a rare but normal situation. Demoting to LL_NOTICE. Examples: - AOF was enabled but there is already another background operation. An AOF background was scheduled to start when possible. - Connection with master lost. base on yoav-steinberg's https://github.com/redis/redis/pull/10650#issuecomment-1112280554 and yossigo's https://github.com/redis/redis/pull/10650#pullrequestreview-967677676
* Cleanup around script_caller, fix tracking of scripts and ACL logging for ↵Oran Agra2023-02-161-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RM_Call (#11770) * Make it clear that current_client is the root client that was called by external connection * add executing_client which is the client that runs the current command (can be a module or a script) * Remove script_caller that was used for commands that have CLIENT_SCRIPT to get the client that called the script. in most cases, that's the current_client, and in others (when being called from a module), it could be an intermediate client when we actually want the original one used by the external connection. bugfixes: * RM_Call with C flag should log ACL errors with the requested user rather than the one used by the original client, this also solves a crash when RM_Call is used with C flag from a detached thread safe context. * addACLLogEntry would have logged info about the script_caller, but in case the script was issued by a module command we actually want the current_client. the exception is when RM_Call is called from a timer event, in which case we don't have a current_client. behavior changes: * client side tracking for scripts now tracks the keys that are read by the script instead of the keys that are declared by the caller for EVAL other changes: * Log both current_client and executing_client in the crash log. * remove prepareLuaClient and resetLuaClient, being dead code that was forgotten. * remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot that serves all commands and is reset only when execution nesting starts. * remove code to propagate CLIENT_FORCE_REPL from the executed command to the script caller since scripts aren't propagated anyway these days and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun. * fix a module GIL violation issue in afterSleep that was introduced in #10300 (unreleased)
* Reclaim page cache of RDB file (#11248)Tian2023-02-121-2/+8
| | | | | | | | | | | | | | | | | | | | # Background The RDB file is usually generated and used once and seldom used again, but the content would reside in page cache until OS evicts it. A potential problem is that once the free memory exhausts, the OS have to reclaim some memory from page cache or swap anonymous page out, which may result in a jitters to the Redis service. Supposing an exact scenario, a high-capacity machine hosts many redis instances, and we're upgrading the Redis together. The page cache in host machine increases as RDBs are generated. Once the free memory drop into low watermark(which is more likely to happen in older Linux kernel like 3.10, before [watermark_scale_factor](https://lore.kernel.org/lkml/1455813719-2395-1-git-send-email-hannes@cmpxchg.org/) is introduced, the `low watermark` is linear to `min watermark`, and there'is not too much buffer space for `kswapd` to be wake up to reclaim memory), a `direct reclaim` happens, which means the process would stall to wait for memory allocation. # What the PR does The PR introduces a capability to reclaim the cache when the RDB is operated. Generally there're two cases, read and write the RDB. For read it's a little messy to address the incremental reclaim, so the reclaim is done in one go in background after the load is finished to avoid blocking the work thread. For write, incremental reclaim amortizes the work of reclaim so no need to put it into background, and the peak watermark of cache can be reduced in this way. Two cases are addresses specially, replication and restart, for both of which the cache is leveraged to speed up the processing, so the reclaim is postponed to a right time. To do this, a flag is added to`rdbSave` and `rdbLoad` to control whether the cache need to be kept, with the default value false. # Something deserve noting 1. Though `posix_fadvise` is the POSIX standard, but only few platform support it, e.g. Linux, FreeBSD 10.0. 2. In Linux `posix_fadvise` only take effect on writeback-ed pages, so a `sync`(or `fsync`, `fdatasync`) is needed to flush the dirty page before `posix_fadvise` if we reclaim write cache. # About test A unit test is added to verify the effect of `posix_fadvise`. In integration test overall cache increase is checked, as well as the cache backed by RDB as a specific TCL test is executed in isolated Github action job.
* Add explicit error log message for AOF_TRUNCATED status when server load ↵Wen Hui2022-11-221-0/+3
| | | | | | | | | | | AOF file (#11484) Now, according to the comments, if the truncated file is not the last file, it will be considered as a fatal error. And the return code will updated to AOF_FAILED, then server will exit without any error message to the client. Similar to other error situations, this PR add an explicit error message for this case and make the client know clearly what happens.
* Add listpack encoding for list (#11303)sundb2022-11-161-31/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve memory efficiency of list keys ## Description of the feature The new listpack encoding uses the old `list-max-listpack-size` config to perform the conversion, which we can think it of as a node inside a quicklist, but without 80 bytes overhead (internal fragmentation included) of quicklist and quicklistNode structs. For example, a list key with 5 items of 10 chars each, now takes 128 bytes instead of 208 it used to take. ## Conversion rules * Convert listpack to quicklist When the listpack length or size reaches the `list-max-listpack-size` limit, it will be converted to a quicklist. * Convert quicklist to listpack When a quicklist has only one node, and its length or size is reduced to half of the `list-max-listpack-size` limit, it will be converted to a listpack. This is done to avoid frequent conversions when we add or remove at the bounding size or length. ## Interface changes 1. add list entry param to listTypeSetIteratorDirection When list encoding is listpack, `listTypeIterator->lpi` points to the next entry of current entry, so when changing the direction, we need to use the current node (listTypeEntry->p) to update `listTypeIterator->lpi` to the next node in the reverse direction. ## Benchmark ### Listpack VS Quicklist with one node * LPUSH - roughly 0.3% improvement * LRANGE - roughly 13% improvement ### Both are quicklist * LRANGE - roughly 3% improvement * LRANGE without pipeline - roughly 3% improvement From the benchmark, as we can see from the results 1. When list is quicklist encoding, LRANGE improves performance by <5%. 2. When list is listpack encoding, LRANGE improves performance by ~13%, the main enhancement is brought by `addListListpackRangeReply()`. ## Memory usage 1M lists(key:0~key:1000000) with 5 items of 10 chars ("hellohello") each. shows memory usage down by 35.49%, from 214MB to 138MB. ## Note 1. Add conversion callback to support doing some work before conversion Since the quicklist iterator decompresses the current node when it is released, we can no longer decompress the quicklist after we convert the list.
* Listpack encoding for sets (#11290)Viktor Söderqvist2022-11-091-46/+21
| | | | | | | | | | | | | | | | | | | | Small sets with not only integer elements are listpack encoded, by default up to 128 elements, max 64 bytes per element, new config `set-max-listpack-entries` and `set-max-listpack-value`. This saves memory for small sets compared to using a hashtable. Sets with only integers, even very small sets, are still intset encoded (up to 1G limit, etc.). Larger sets are hashtable encoded. This PR increments the RDB version, and has an effect on OBJECT ENCODING Possible conversions when elements are added: intset -> listpack listpack -> hashtable intset -> hashtable Note: No conversion happens when elements are deleted. If all elements are deleted and then added again, the set is deleted and recreated, thus implicitly converted to a smaller encoding.
* Fix replication inconsistency on modules that uses key space notifications ↵Meir Shpilraien (Spielrein)2022-08-181-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#10969) Fix replication inconsistency on modules that uses key space notifications. ### The Problem In general, key space notifications are invoked after the command logic was executed (this is not always the case, we will discuss later about specific command that do not follow this rules). For example, the `set x 1` will trigger a `set` notification that will be invoked after the `set` logic was performed, so if the notification logic will try to fetch `x`, it will see the new data that was written. Consider the scenario on which the notification logic performs some write commands. for example, the notification logic increase some counter, `incr x{counter}`, indicating how many times `x` was changed. The logical order by which the logic was executed is has follow: ``` set x 1 incr x{counter} ``` The issue is that the `set x 1` command is added to the replication buffer at the end of the command invocation (specifically after the key space notification logic was invoked and performed the `incr` command). The replication/aof sees the commands in the wrong order: ``` incr x{counter} set x 1 ``` In this specific example the order is less important. But if, for example, the notification would have deleted `x` then we would end up with primary-replica inconsistency. ### The Solution Put the command that cause the notification in its rightful place. In the above example, the `set x 1` command logic was executed before the notification logic, so it should be added to the replication buffer before the commands that is invoked by the notification logic. To achieve this, without a major code refactoring, we save a placeholder in the replication buffer, when finishing invoking the command logic we check if the command need to be replicated, and if it does, we use the placeholder to add it to the replication buffer instead of appending it to the end. To be efficient and not allocating memory on each command to save the placeholder, the replication buffer array was modified to reuse memory (instead of allocating it each time we want to replicate commands). Also, to avoid saving a placeholder when not needed, we do it only for WRITE or MAY_REPLICATE commands. #### Additional Fixes * Expire and Eviction notifications: * Expire/Eviction logical order was to first perform the Expire/Eviction and then the notification logic. The replication buffer got this in the other way around (first notification effect and then the `del` command). The PR fixes this issue. * The notification effect and the `del` command was not wrap with `multi-exec` (if needed). The PR also fix this issue. * SPOP command: * On spop, the `spop` notification was fired before the command logic was executed. The change in this PR would have cause the replication order to be change (first `spop` command and then notification `logic`) although the logical order is first the notification logic and then the `spop` logic. The right fix would have been to move the notification to be fired after the command was executed (like all the other commands), but this can be considered a breaking change. To overcome this, the PR keeps the current behavior and changes the `spop` code to keep the right logical order when pushing commands to the replication buffer. Another PR will follow to fix the SPOP properly and match it to the other command (we split it to 2 separate PR's so it will be easy to cherry-pick this PR to 7.0 if we chose to). #### Unhanded Known Limitations * key miss event: * On key miss event, if a module performed some write command on the event (using `RM_Call`), the `dirty` counter would increase and the read command that cause the key miss event would be replicated to the replication and aof. This problem can also happened on a write command that open some keys but eventually decides not to perform any action. We decided not to handle this problem on this PR because the solution is complex and will cause additional risks in case we will want to cherry-pick this PR. We should decide if we want to handle it in future PR's. For now, modules writers is advice not to perform any write commands on key miss event. #### Testing * We already have tests to cover cases where a notification is invoking write commands that are also added to the replication buffer, the tests was modified to verify that the replica gets the command in the correct logical order. * Test was added to verify that `spop` behavior was kept unchanged. * Test was added to verify key miss event behave as expected. * Test was added to verify the changes do not break lazy expiration. #### Additional Changes * `propagateNow` function can accept a special dbid, -1, indicating not to replicate `select`. We use this to replicate `multi/exec` on `propagatePendingCommands` function. The side effect of this change is that now the `select` command will appear inside the `multi/exec` block on the replication stream (instead of outside of the `multi/exec` block). Tests was modified to match this new behavior.
* errno cleanup around rdbLoad (#11042)Binbin2022-08-041-1/+2
| | | | | | | | | | | | | | | This is an addition to #11039, which cleans up rdbLoad* related errno. Remove the errno print from the outer message (may be invalid since errno may have been overwritten). Our aim should be the code that detects the error and knows which system call triggered it, is the one to print errno, and not the code way up above (in some cases a result of a logical error and not a system one). Remove the code to update errno in rdbLoadRioWithLoadingCtx, signature check and the rdb version check, in these cases, we do print the error message. The caller dose not have the specific logic for handling EINVAL. Small fix around rdb-preamble AOF: A truncated RDB is considered a failure, not handled the same as a truncated AOF file.
* fsync the old aof file when open a new INCR AOF (#11004)Binbin2022-07-251-2/+17
| | | | | | | | | | | | | | | | | | | | | In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1), and before openNewIncrAofForAppend, we should call redis_fsync to fsync the aof file. Because we may open a new INCR AOF in openNewIncrAofForAppend, in the case of using everysec policy, the old AOF file may not be fsynced in time (or even at all). When using everysec, we don't want to pay the disk latency from the main thread, so we will do a background fsync. Adding a argument for bioCreateCloseJob, a `need_fsync` flag to indicate that a fsync is required before the file is closed. So we will fsync the old AOF file before we close it. A cleanup, we make union become a union, since the free_* args and the fd / fsync args are never used together. Co-authored-by: Oran Agra <oran@redislabs.com>
* Set aof rewrite status in some backgroundRewriteDoneHandler errors (#10923)Binbin2022-07-031-0/+6
| | | | | We should also set aof_lastbgrewrite_status to C_ERR on these errors. Because aof rewrite did fail, and we did not finish the manifest update. Also maintain the stat_aofrw_consecutive_failures.
* Always set server.aof_last_write_errno in aof write error (#10917)Binbin2022-07-031-1/+1
| | | | | | | | | | | | | | | The `can_log` variable prevents us from outputting too many error logs. But it should not include the modification of server.aof_last_write_errno. We are doing this because: 1. In the short write case, we always set aof_last_write_errno to ENOSPC, we don't care the `can_log` flag. 2. And we always set aof_last_write_status to C_ERR in aof write error (except for FSYNC_ALWAYS, we exit). So there may be a chance that `aof_last_write_errno` is not right. An innocent bug or just a code cleanup.
* When dirCreateIfMissing or openNewIncrAofForAppend fail, set ↵Binbin2022-06-231-1/+5
| | | | | | aof_lastbgrewrite_status to err (#10775) It will be displayed in the `aof_last_bgrewrite_status` field of the INFO command.
* Fsync directory while persisting AOF manifest, RDB file, and config file ↵Tian2022-06-201-0/+10
| | | | | | | | | | | | | | | | | | | | (#10737) The current process to persist files is `write` the data, `fsync` and `rename` the file, but a underlying problem is that the rename may be lost when a sudden crash like power outage and the directory hasn't been persisted. The article [Ensuring data reaches disk](https://lwn.net/Articles/457667/) mentions a safe way to update file should be: 1. create a new temp file (on the same file system!) 2. write data to the temp file 3. fsync() the temp file 4. rename the temp file to the appropriate name 5. fsync() the containing directory This commit handles CONFIG REWRITE, AOF manifest, and RDB file (both for persistence, and the one the replica gets from the master). It doesn't handle (yet), ACL SAVE and Cluster configs, since these don't yet follow this pattern.
* Expose script flags to processCommand for better handling (#10744)Oran Agra2022-06-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The important part is that read-only scripts (not just EVAL_RO and FCALL_RO, but also ones with `no-writes` executed by normal EVAL or FCALL), will now be permitted to run during CLIENT PAUSE WRITE (unlike before where only the _RO commands would be processed). Other than that, some errors like OOM, READONLY, MASTERDOWN are now handled by processCommand, rather than the command itself affects the error string (and even error code in some cases), and command stats. Besides that, now the `may-replicate` commands, PFCOUNT and PUBLISH, will be considered `write` commands in scripts and will be blocked in all read-only scripts just like other write commands. They'll also be blocked in EVAL_RO (i.e. even for scripts without the `no-writes` shebang flag. This commit also hides the `may_replicate` flag from the COMMAND command output. this is a **breaking change**. background about may_replicate: We don't want to expose a no-may-replicate flag or alike to scripts, since we consider the may-replicate thing an internal concern of redis, that we may some day get rid of. In fact, the may-replicate flag was initially introduced to flag EVAL: since we didn't know what it's gonna do ahead of execution, before function-flags existed). PUBLISH and PFCOUNT, both of which because they have side effects which may some day be fixed differently. code changes: The changes in eval.c are mostly code re-ordering: - evalCalcFunctionName is extracted out of evalGenericCommand - evalExtractShebangFlags is extracted luaCreateFunction - evalGetCommandFlags is new code
* loadAppendOnlyFiles and loadSingleAppendOnlyFile ret should be AOF_OK, not ↵Binbin2022-05-291-4/+4
| | | | | | | C_OK (#10790) The ret value should be AOF_OK instead of C_OK. AOF_OK and C_OK are both 0, so this is just a cleanup. Also updated some outdated comments.
* improve logging around AOF file creation and loading (#10763)chenyang80942022-05-261-9/+25
| | | | | | | | instead of printing a log when a folder or a manifest is missing (level reduced), we print: total time it took to load all the aof files when creating a new base or incr file starting to write to an existing incr file on startup
* Delete renamed new incr when write manifest failed (#10649)chenyang80942022-04-271-5/+10
| | | Followup fix for #10616
* Fix bug when AOF enabled after startup. put the new incr file in the ↵chenyang80942022-04-261-27/+96
| | | | | | | | | | | | | | | | | | | | | manifest only when AOFRW is done. (#10616) Changes: - When AOF is enabled **after** startup, the data accumulated during `AOF_WAIT_REWRITE` will only be stored in a temp INCR AOF file. Only after the first AOFRW is successful, we will add it to manifest file. Before this fix, the manifest referred to the temp file which could cause a restart during that time to load it without it's base. - Add `aof_rewrites_consecutive_failures` info field for aofrw limiting implementation. Now we can guarantee that these behaviors of MP-AOF are the same as before (past redis releases): - When AOF is enabled after startup, the data accumulated during `AOF_WAIT_REWRITE` will only be stored in a visible place. Only after the first AOFRW is successful, we will add it to manifest file. - When disable AOF, we did not delete the AOF file in the past so there's no need to change that behavior now (yet). - When toggling AOF off and then on (could be as part of a full-sync), a crash or restart before the first rewrite is completed, would result with the previous version being loaded (might not be right thing, but that's what we always had).
* Fixes around AOF failed rewrite rate limiting (#10582)judeng2022-04-191-24/+25
| | | | | | | | | | | | Changes: 1. Check the failed rewrite time threshold only when we actually consider triggering a rewrite. i.e. this should be the last condition tested, since the test has side effects (increasing time threshold) Could have happened in some rare scenarios 2. no limit in startup state (e.g. after restarting redis that previously failed and had many incr files) 3. the “triggered the limit” log would be recorded only when the limit status is returned 4. remove failure count in log (could be misleading in some cases) Co-authored-by: chenyang8094 <chenyang8094@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com>
* Fix auto-aof-rewrite-percentage based AOFRW trigger after restart (#10550)chenyang80942022-04-071-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | The `auto-aof-rewrite-percentage` config defines at what growth percentage an automatic AOF rewrite is triggered. This normally works OK since the size of the AOF file at the end of a rewrite is stored in `server.aof_rewrite_base_size`. However, on startup, redis used to store the entire size of the AOF file into that variable, resulting in a wrong automatic AOF rewrite trigger (could have been triggered much later than desired). This issue would only affect the first AOFRW after startup, after that future AOFRW would have been triggered correctly. This bug existed in all previous versions of Redis. This PR unifies the meaning of `server.aof_rewrite_base_size`, which only represents the size of BASE AOF. Note that after an AOFRW this size includes the size of the incremental file (all the commands that executed during rewrite), so that auto-aof-rewrite-percentage is the ratio from the size of the AOF after rewrite. However, on startup, it is complicated to know that size, and we compromised on taking just the size of the base file, this means that the first rewrite after startup can happen a little bit too soon. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: yoav-steinberg <yoav@redislabs.com>
* Functions: Move library meta data to be part of the library payload. (#10500)Meir Shpilraien (Spielrein)2022-04-051-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## Move library meta data to be part of the library payload. Following the discussion on https://github.com/redis/redis/issues/10429 and the intention to add (in the future) library versioning support, we believe that the entire library metadata (like name and engine) should be part of the library payload and not provided by the `FUNCTION LOAD` command. The reasoning behind this is that the programmer who developed the library should be the one who set those values (name, engine, and in the future also version). **It is not the responsibility of the admin who load the library into the database.** The PR moves all the library metadata (engine and function name) to be part of the library payload. The metadata needs to be provided on the first line of the payload using the shebang format (`#!<engine> name=<name>`), example: ```lua #!lua name=test redis.register_function('foo', function() return 1 end) ``` The above script will run on the Lua engine and will create a library called `test`. ## API Changes (compare to 7.0 rc2) * `FUNCTION LOAD` command was change and now it simply gets the library payload and extract the engine and name from the payload. In addition, the command will now return the function name which can later be used on `FUNCTION DELETE` and `FUNCTION LIST`. * The description field was completely removed from`FUNCTION LOAD`, and `FUNCTION LIST` ## Breaking Changes (compare to 7.0 rc2) * Library description was removed (we can re-add it in the future either as part of the shebang line or an additional line). * Loading an AOF file that was generated by either 7.0 rc1 or 7.0 rc2 will fail because the old command syntax is invalid. ## Notes * Loading an RDB file that was generated by rc1 / rc2 **is** supported, Redis will automatically add the shebang to the libraries payloads (we can probably delete that code after 7.0.3 or so since there's no need to keep supporting upgrades from an RC build).
* fix typos in aof.c (#10513)judeng2022-04-041-10/+10
| | | | | function aofRewriteLimited in aof.c, deley->delay and NAX->MAX Co-authored-by: judeng <judeng@didiglobal.com>
* Add stream consumer group lag tracking and reporting (#9127)Itamar Haber2022-02-231-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <guy.benoish@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com>
* fix return value of loadAppendOnlyFiles (#10295)YaacovHazan2022-02-221-16/+38
| | | | | | | | | | | | | | Make sure the status return from loading multiple AOF files reflects the overall result, not just the one of the last file. When one of the AOF files succeeded to load, but the last AOF file was empty, the loadAppendOnlyFiles will return AOF_EMPTY. This commit changes this behavior, and return AOF_OK in that case. This can happen for example, when loading old AOF file, and no more commands processed, the manifest file will include base AOF file with data, and empty incr AOF file. Co-authored-by: chenyang8094 <chenyang8094@users.noreply.github.com> Co-authored-by: Oran Agra <oran@redislabs.com>
* aof rewrite and rdb save counters in info (#10178)yoav-steinberg2022-02-171-1/+1
| | | | | Add aof_rewrites and rdb_snapshots counters to info. This is useful to figure our if a rewrite or snapshot happened since last check. This was part of the (ongoing) effort to provide a safe backup solution for multipart-aof backups.
* Adapt redis-check-aof tool for Multi Part Aof (#10061)chenyang80942022-02-171-24/+31
| | | | | | | | | | | | | | | | | | | | | | | Modifications of this PR: 1. Support the verification of `Multi Part AOF`, while still maintaining support for the old-style `AOF/RDB-preamble`. `redis-check-aof` will automatically choose which mode to use according to the incoming file format. `Usage: redis-check-aof [--fix|--truncate-to-timestamp $timestamp] <AOF/manifest>` 2. Refactor part of the code to make it easier to understand 3. Currently only supports truncate (`--fix` or `--truncate-to-timestamp`) the last AOF file (may be `BASE` or `INCR`) The reasons for 3 above: - for `--fix`: Only the last AOF may be truncated, this is guaranteed by redis - for `--truncate-to-timestamp`: Normally, we only have `BASE` + `INCR` files at most, and `BASE` cannot be truncated(It only contains a timestamp annotation at the beginning of the file), so only `INCR` can be truncated. If we have a `BASE+INCR1+INCR2` file (meaning we have an interrupted AOFRW), Only `INCR2` files can be truncated at this time. If we still insist on truncate `INCR1`, we need to manually delete `INCR2` and update the manifest file, then re-run `redis-check-aof` - If we want to support truncate any file, we need to add very complicated code to support the atomic modification of multiple file deletion and update manifest, I think this is unnecessary
* Removed double semicolon at the end of line (#10305)Matteo Baccan2022-02-161-1/+1
|
* Modify AOF preamble related logs, and change the RDB aux field (#10283)chenyang80942022-02-111-7/+16
| | | | | In multi-part aof, We no longer have the concept of `RDB-preamble`, so the related logs should be removed. However, in order to print compatible logs when loading old-style AOFs, we also have to keep the relevant code. Additionally, when saving an RDB, change the RDB aux field from "aof-preamble" to "aof-base".
* Fix typo in function_load local variable (#10209)Tobias Nießen2022-01-301-2/+2
| | | Refs: https://github.com/redis/redis/pull/10141
* Delete the residual code related to aof rewrite buf (#10176)chenyang80942022-01-251-3/+1
|
* Added AOF rewrite support for functions. (#10141)Meir Shpilraien (Spielrein)2022-01-191-0/+31
| | | | | | | | | Function PR was merged without AOF rw support because we thought this feature was going to be removed on Redis 7. Tests was added on aofrw.tcl Other existing aofrw tests where slow due to unwanted rdb-key-save-delay Co-authored-by: Oran Agra <oran@redislabs.com>
* Fix additional AOF filename issues. (#10110)Yossi Gottlieb2022-01-181-12/+12
| | | | | | | | This extends the previous fix (#10049) to address any form of non-printable or whitespace character (including newlines, quotes, non-printables, etc.) Also, removes the limitation on appenddirname, to align with the way filenames are handled elsewhere in Redis.
* Use am instead of using server.aof_manifest directly to call ↵chenyang80942022-01-171-1/+1
| | | | getBaseAndIncrAppendOnlyFilesSize (#10123)
* Always create base AOF file when redis start from empty. (#10102)chenyang80942022-01-131-4/+19
| | | | | | | | | | | | | | | | | | Force create a BASE file (use a foreground `rewriteAppendOnlyFile`) when redis starts from an empty data set and `appendonly` is yes. The reasoning is that normally, after redis is running for some time, and the AOF has gone though a few rewrites, there's always a base rdb file. and the scenario where the base file is missing, is kinda rare (happens only at empty startup), so this change normalizes it. But more importantly, there are or could be some complex modules that are started with some configuration, when they create persistence they write that configuration to RDB AUX fields, so that can can always know with which configuration the persistence file they're loading was created (could be critical). there is (was) one scenario in which they could load their persisted data, and that configuration was missing, and this change fixes it. Add a new module event: REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START, similar to REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START which is async. Co-authored-by: Oran Agra <oran@redislabs.com>
* Support whitespace characters in appendfilename, and ban them in ↵chenyang80942022-01-101-10/+25
| | | | | | appenddirname (#10049) 1. Ban whitespace characters in `appenddirname` 2. Handle the case where `appendfilename` contains spaces (for backwards compatibility)
* Fix typos in aof.c / redis.conf (#10057)Binbin2022-01-051-1/+1
|
* Ban snapshot-creating commands and other admin commands from transactions ↵guybe72022-01-041-1/+4
| | | | | | | | | | | | | | | | | | | (#10015) Creating fork (or even a foreground SAVE) during a transaction breaks the atomicity of the transaction. In addition to that, it could mess up the propagated transaction to the AOF file. This change blocks SAVE, PSYNC, SYNC and SHUTDOWN from being executed inside MULTI-EXEC. It does that by adding a command flag, so that modules can flag their commands with that flag too. Besides it changes BGSAVE, BGREWRITEAOF, and CONFIG SET appendonly, to turn the scheduled flag instead of forking righ taway. Other changes: * expose `protected`, `no-async-loading`, and `no_multi` flags in COMMAND command * add a test to validate propagation of FLUSHALL inside a transaction. * add a test to validate how CONFIG SET that errors reacts in a transaction Co-authored-by: Oran Agra <oran@redislabs.com>
* Implement Multi Part AOF mechanism to avoid AOFRW overheads. (#9788)chenyang80942022-01-031-504/+988
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement Multi-Part AOF mechanism to avoid overheads during AOFRW. Introducing a folder with multiple AOF files tracked by a manifest file. The main issues with the the original AOFRW mechanism are: * buffering of commands that are processed during rewrite (consuming a lot of RAM) * freezes of the main process when the AOFRW completes to drain the remaining part of the buffer and fsync it. * double disk IO for the data that arrives during AOFRW (had to be written to both the old and new AOF files) The main modifications of this PR: 1. Remove the AOF rewrite buffer and related code. 2. Divide the AOF into multiple files, they are classified as two types, one is the the `BASE` type, it represents the full amount of data (Maybe AOF or RDB format) after each AOFRW, there is only one `BASE` file at most. The second is `INCR` type, may have more than one. They represent the incremental commands since the last AOFRW. 3. Use a AOF manifest file to record and manage these AOF files mentioned above. 4. The original configuration of `appendfilename` will be the base part of the new file name, for example: `appendonly.aof.1.base.rdb` and `appendonly.aof.2.incr.aof` 5. Add manifest-related TCL tests, and modified some existing tests that depend on the `appendfilename` 6. Remove the `aof_rewrite_buffer_length` field in info. 7. Add `aof-disable-auto-gc` configuration. By default we're automatically deleting HISTORY type AOFs. It also gives users the opportunity to preserve the history AOFs. just for testing use now. 8. Add AOFRW limiting measure. When the AOFRW failures reaches the threshold (3 times now), we will delay the execution of the next AOFRW by 1 minute. If the next AOFRW also fails, it will be delayed by 2 minutes. The next is 4, 8, 16, the maximum delay is 60 minutes (1 hour). During the limit period, we can still use the 'bgrewriteaof' command to execute AOFRW immediately. 9. Support upgrade (load) data from old version redis. 10. Add `appenddirname` configuration, as the directory name of the append only files. All AOF files and manifest file will be placed in this directory. 11. Only the last AOF file (BASE or INCR) can be truncated. Otherwise redis will exit even if `aof-load-truncated` is enabled. Co-authored-by: Oran Agra <oran@redislabs.com>
* Generate RDB with Functions only via redis-cli --functions-rdb (#9968)yoav-steinberg2022-01-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | This is needed in order to ease the deployment of functions for ephemeral cases, where user needs to spin up a server with functions pre-loaded. #### Details: * Added `--functions-rdb` option to _redis-cli_. * Functions only rdb via `REPLCONF rdb-filter-only functions`. This is a placeholder for a space separated inclusion filter for the RDB. In the future can be `REPLCONF rdb-filter-only "functions db:3 key-patten:user*"` and a complementing `rdb-filter-exclude` `REPLCONF` can also be added. * Handle "slave requirements" specification to RDB saving code so we can use the same RDB when different slaves express the same requirements (like functions-only) and not share the RDB when their requirements differ. This is currently just a flags `int`, but can be extended to a more complex structure with various filter fields. * make sure to support filters only in diskless replication mode (not to override the persistence file), we do that by forcing diskless (even if disabled by config) other changes: * some refactoring in rdb.c (extract portion of a big function to a sub-function) * rdb_key_save_delay used in AOFRW too * sendChildInfo takes the number of updated keys (incremental, rather than absolute) Co-authored-by: Oran Agra <oran@redislabs.com>
* Sort out mess around propagation and MULTI/EXEC (#9890)guybe72021-12-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mess: Some parts use alsoPropagate for late propagation, others using an immediate one (propagate()), causing edge cases, ugly/hacky code, and the tendency for bugs The basic idea is that all commands are propagated via alsoPropagate (i.e. added to a list) and the top-most call() is responsible for going over that list and actually propagating them (and wrapping them in MULTI/EXEC if there's more than one command). This is done in the new function, propagatePendingCommands. Callers to propagatePendingCommands: 1. top-most call() (we want all nested call()s to add to the also_propagate array and just the top-most one to propagate them) - via `afterCommand` 2. handleClientsBlockedOnKeys: it is out of call() context and it may propagate stuff - via `afterCommand`. 3. handleClientsBlockedOnKeys edge case: if the looked-up key is already expired, we will propagate the expire but will not unblock any client so `afterCommand` isn't called. in that case, we have to propagate the deletion explicitly. 4. cron stuff: active-expire and eviction may also propagate stuff 5. modules: the module API allows to propagate stuff from just about anywhere (timers, keyspace notifications, threads). I could have tried to catch all the out-of-call-context places but it seemed easier to handle it in one place: when we free the context. in the spirit of what was done in call(), only the top-most freeing of a module context may cause propagation. 6. modules: when using a thread-safe ctx it's not clear when/if the ctx will be freed. we do know that the module must lock the GIL before calling RM_Replicate/RM_Call so we propagate the pending commands when releasing the GIL. A "known limitation", which were actually a bug, was fixed because of this commit (see propagate.tcl): When using a mix of RM_Call with `!` and RM_Replicate, the command would propagate out-of-order: first all the commands from RM_Call, and then the ones from RM_Replicate Another thing worth mentioning is that if, in the past, a client would issue a MULTI/EXEC with just one write command the server would blindly propagate the MULTI/EXEC too, even though it's redundant. not anymore. This commit renames propagate() to propagateNow() in order to cause conflicts in pending PRs. propagatePendingCommands is the only caller of propagateNow, which is now a static, internal helper function. Optimizations: 1. alsoPropagate will not add stuff to also_propagate if there's no AOF and replicas 2. alsoPropagate reallocs also_propagagte exponentially, to save calls to memmove Bugfixes: 1. CONFIG SET can create evictions, sending notifications which can cause to dirty++ with modules. we need to prevent it from propagating to AOF/replicas 2. We need to set current_client in RM_Call. buggy scenario: - CONFIG SET maxmemory, eviction notifications, module hook calls RM_Call - assertion in lookupKey crashes, because current_client has CONFIG SET, which isn't CMD_WRITE 3. minor: in eviction, call propagateDeletion after notification, like active-expire and all commands (we always send a notification before propagating the command)
* Remove EVAL script verbatim replication, propagation, and deterministic ↵zhugezy2021-12-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | execution logic (#9812) # Background The main goal of this PR is to remove relevant logics on Lua script verbatim replication, only keeping effects replication logic, which has been set as default since Redis 5.0. As a result, Lua in Redis 7.0 would be acting the same as Redis 6.0 with default configuration from users' point of view. There are lots of reasons to remove verbatim replication. Antirez has listed some of the benefits in Issue #5292: >1. No longer need to explain to users side effects into scripts. They can do whatever they want. >2. No need for a cache about scripts that we sent or not to the slaves. >3. No need to sort the output of certain commands inside scripts (SMEMBERS and others): this both simplifies and gains speed. >4. No need to store scripts inside the RDB file in order to startup correctly. >5. No problems about evicting keys during the script execution. When looking back at Redis 5.0, antirez and core team decided to set the config `lua-replicate-commands yes` by default instead of removing verbatim replication directly, in case some bad situations happened. 3 years later now before Redis 7.0, it's time to remove it formally. # Changes - configuration for lua-replicate-commands removed - created config file stub for backward compatibility - Replication script cache removed - this is useless under script effects replication - relevant statistics also removed - script persistence in RDB files is also removed - Propagation of SCRIPT LOAD and SCRIPT FLUSH to replica / AOF removed - Deterministic execution logic in scripts removed (i.e. don't run write commands after random ones, and sorting output of commands with random order) - the flags indicating which commands have non-deterministic results are kept as hints to clients. - `redis.replicate_commands()` & `redis.set_repl()` changed - now `redis.replicate_commands()` does nothing and return an 1 - ...and then `redis.set_repl()` can be issued before `redis.replicate_commands()` now - Relevant TCL cases adjusted - DEBUG lua-always-replicate-commands removed # Other changes - Fix a recent bug comparing CLIENT_ID_AOF to original_client->flags instead of id. (introduced in #9780) Co-authored-by: Oran Agra <oran@redislabs.com>
* Redis Functions - Added redis function unit and Lua enginemeir@redislabs.com2021-12-021-1/+3
| | | | | | | | | | | | | | | | | Redis function unit is located inside functions.c and contains Redis Function implementation: 1. FUNCTION commands: * FUNCTION CREATE * FCALL * FCALL_RO * FUNCTION DELETE * FUNCTION KILL * FUNCTION INFO 2. Register engine In addition, this commit introduce the first engine that uses the Redis Function capabilities, the Lua engine.
* Sort out the mess around writable replicas and lookupKeyRead/Write (#9572)Viktor Söderqvist2021-11-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Writable replicas now no longer use the values of expired keys. Expired keys are deleted when lookupKeyWrite() is used, even on a writable replica. Previously, writable replicas could use the value of an expired key in write commands such as INCR, SUNIONSTORE, etc.. This commit also sorts out the mess around the functions lookupKeyRead() and lookupKeyWrite() so they now indicate what we intend to do with the key and are not affected by the command calling them. Multi-key commands like SUNIONSTORE, ZUNIONSTORE, COPY and SORT with the store option now use lookupKeyRead() for the keys they're reading from (which will not allow reading from logically expired keys). This commit also fixes a bug where PFCOUNT could return a value of an expired key. Test modules commands have their readonly and write flags updated to correctly reflect their lookups for reading or writing. Modules are not required to correctly reflect this in their command flags, but this change is made for consistency since the tests serve as usage examples. Fixes #6842. Fixes #7475.
* fix fob bad log messages in rdbSave (#9842) (#9843)Pavel Melkozerov2021-11-241-1/+2
| | | | | | logs message prints wrong file is failed to open temporary file logs have error occurred in getcwd (uses same errno to report error) Co-authored-by: Pavel Melkozerov <pavel.melkozerov@nokia.com>
* set aof rewrite status err, when fork fail (#5606)harleyliao2021-11-161-0/+1
| | | | when aof rewrite is failed by fork(), It'll be indicated by aof_last_bgrewrite_status INFO field, same as when the fork child fails later on.
* Add sanitizer support and clean up sanitizer findings (#9601)Ozan Tezcan2021-11-111-1/+5
| | | | | | | | | | | | | | | | | | | | | | | - Added sanitizer support. `address`, `undefined` and `thread` sanitizers are available. - To build Redis with desired sanitizer : `make SANITIZER=undefined` - There were some sanitizer findings, cleaned up codebase - Added tests with address and undefined behavior sanitizers to daily CI. - Added tests with address sanitizer to the per-PR CI (smoke out mem leaks sooner). Basically, there are three types of issues : **1- Unaligned load/store** : Most probably, this issue may cause a crash on a platform that does not support unaligned access. Redis does unaligned access only on supported platforms. **2- Signed integer overflow.** Although, signed overflow issue can be problematic time to time and change how compiler generates code, current findings mostly about signed shift or simple addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue as far as I can tell (checked generated code on godbolt.org). **3 -Minor leak** (redis-cli), **use-after-free**(just before calling exit()); UB means nothing guaranteed and risky to reason about program behavior but I don't think any of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues will be the real benefit.
* Replica keep serving data during repl-diskless-load=swapdb for better ↵Eduardo Semprebon2021-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | availability (#9323) For diskless replication in swapdb mode, considering we already spend replica memory having a backup of current db to restore in case of failure, we can have the following benefits by instead swapping database only in case we succeeded in transferring db from master: - Avoid `LOADING` response during failed and successful synchronization for cases where the replica is already up and running with data. - Faster total time of diskless replication, because now we're moving from Transfer + Flush + Load time to Transfer + Load only. Flushing the tempDb is done asynchronously after swapping. - This could be implemented also for disk replication with similar benefits if consumers are willing to spend the extra memory usage. General notes: - The concept of `backupDb` becomes `tempDb` for clarity. - Async loading mode will only kick in if the replica is syncing from a master that has the same repl-id the one it had before. i.e. the data it's getting belongs to a different time of the same timeline. - New property in INFO: `async_loading` to differentiate from the blocking loading - Slot to Key mapping is now a field of `redisDb` as it's more natural to access it from both server.db and the tempDb that is passed around. - Because this is affecting replicas only, we assume that if they are not readonly and write commands during replication, they are lost after SYNC same way as before, but we're still denying CONFIG SET here anyways to avoid complications. Considerations for review: - We have many cases where server.loading flag is used and even though I tried my best, there may be cases where async_loading should be checked as well and cases where it shouldn't (would require very good understanding of whole code) - Several places that had different behavior depending on the loading flag where actually meant to just handle commands coming from the AOF client differently than ones coming from real clients, changed to check CLIENT_ID_AOF instead. **Additional for Release Notes** - Bugfix - server.dirty was not incremented for any kind of diskless replication, as effect it wouldn't contribute on triggering next database SAVE - New flag for RM_GetContextFlags module API: REDISMODULE_CTX_FLAGS_ASYNC_LOADING - Deprecated RedisModuleEvent_ReplBackup. Starting from Redis 7.0, we don't fire this event. Instead, we have the new RedisModuleEvent_ReplAsyncLoad holding 3 sub-events: STARTED, ABORTED and COMPLETED. - New module flag REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD for RedisModule_SetModuleOptions to allow modules to declare they support the diskless replication with async loading (when absent, we fall back to disk-based loading). Co-authored-by: Eduardo Semprebon <edus@saxobank.com> Co-authored-by: Oran Agra <oran@redislabs.com>