delta/redis.git - github.com: antirez/redis.git

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fix WAITAOF reply when using last_offset and last_numreplicas (#11917)	Binbin	2023-03-15	2	-9/+69
\| \| \| \| \| \| \| \| \|	WAITAOF wad added in #11713, its return is an array. But forget to handle WAITAOF in last_offset and last_numreplicas, causing WAITAOF to return a WAIT like reply. Tests was added to validate that case (both WAIT and WAITAOF). This PR also refactored processClientsWaitingReplicas a bit for better maintainability and readability.
*	Implementing the WAITAOF command (issue #10505) (#11713)	Slava Koyfman	2023-03-14	2	-2/+233
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implementing the WAITAOF functionality which would allow the user to block until a specified number of Redises have fsynced all previous write commands to the AOF. Syntax: `WAITAOF <num_local> <num_replicas> <timeout>` Response: Array containing two elements: num_local, num_replicas num_local is always either 0 or 1 representing the local AOF on the master. num_replicas is the number of replicas that acknowledged the a replication offset of the last write being fsynced to the AOF. Returns an error when called on replicas, or when called with non-zero num_local on a master with AOF disabled, in all other cases the response just contains number of fsync copies. Main changes: * Added code to keep track of replication offsets that are confirmed to have been fsynced to disk. * Keep advancing master_repl_offset even when replication is disabled (and there's no replication backlog, only if there's an AOF enabled). This way we can use this command and it's mechanisms even when replication is disabled. * Extend REPLCONF ACK to `REPLCONF ACK <ofs> FACK <ofs>`, the FACK will be appended only if there's an AOF on the replica, and already ignored on old masters (thus backwards compatible) * WAIT now no longer wait for the replication offset after your last command, but rather the replication offset after your last write (or read command that caused propagation, e.g. lazy expiry). Unrelated changes: * WAIT command respects CLIENT_DENY_BLOCKING (not just CLIENT_MULTI) Implementation details: * Add an atomic var named `fsynced_reploff_pending` that's updated (usually by the bio thread) and later copied to the main `fsynced_reploff` variable (only if the AOF base file exists). I.e. during the initial AOF rewrite it will not be used as the fsynced offset since the AOF base is still missing. * Replace close+fsync bio job with new BIO_CLOSE_AOF (AOF specific) job that will also update fsync offset the field. * Handle all AOF jobs (BIO_CLOSE_AOF, BIO_AOF_FSYNC) in the same bio worker thread, to impose ordering on their execution. This solves a race condition where a job could set `fsynced_reploff_pending` to a higher value than another pending fsync job, resulting in indicating an offset for which parts of the data have not yet actually been fsynced. Imposing an ordering on the jobs guarantees that fsync jobs are executed in increasing order of replication offset. * Drain bio jobs when switching `appendfsync` to "always" This should prevent a write race between updates to `fsynced_reploff_pending` in the main thread (`flushAppendOnlyFile` when set to ALWAYS fsync), and those done in the bio thread. * Drain the pending fsync when starting over a new AOF to avoid race conditions with the previous AOF offsets overriding the new one (e.g. after switching to replicate from a new master). * Make sure to update the fsynced offset at the end of the initial AOF rewrite. a must in case there are no additional writes that trigger a periodic fsync, specifically for a replica that does a full sync. Limitations: It is possible to write a module and a Lua script that propagate to the AOF and doesn't propagate to the replication stream. see REDISMODULE_ARGV_NO_REPLICAS and luaRedisSetReplCommand. These features are incompatible with the WAITAOF command, and can result in two bad cases. The scenario is that the user executes command that only propagates to AOF, and then immediately issues a WAITAOF, and there's no further writes on the replication stream after that. 1. if the the last thing that happened on the replication stream is a PING (which increased the replication offset but won't trigger an fsync on the replica), then the client would hang forever (will wait for an fack that the replica will never send sine it doesn't trigger any fsyncs). 2. if the last thing that happened is a write command that got propagated properly, then WAITAOF will be released immediately, without waiting for an fsync (since the offset didn't change) Refactoring: * Plumbing to allow bio worker to handle multiple job types This introduces infrastructure necessary to allow BIO workers to not have a 1-1 mapping of worker to job-type. This allows in the future to assign multiple job types to a single worker, either as a performance/resource optimization, or as a way of enforcing ordering between specific classes of jobs. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Fix tail->repl_offset update in feedReplicationBuffer (#11905)	Binbin	2023-03-13	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #11666, we added a while loop and will split a big reply node to multiple nodes. The update of tail->repl_offset may be wrong. Like before #11666, we would have created at most one new reply node, and now we will create multiple nodes if it is a big reply node. Now we are creating more than one node, and the tail->repl_offset of all the nodes except the last one are incorrect. Because we update master_repl_offset at the beginning, and then use it to update the tail->repl_offset. This would have lead to an assertion during PSYNC, a test was added to validate that case. Besides that, the calculation of size was adjusted to fix tests that failed due to a combination of a very low backlog size, and some thresholds of that get violated because of the relatively high overhead of replBufBlock. So now if the backlog size / 16 is too small, we'll take PROTO_REPLY_CHUNK_BYTES instead. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Large blocks of replica client output buffer could lead to psync loops and ↵	xbasel	2023-03-12	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unnecessary memory usage (#11666) This can happen when a key almost equal or larger than the client output buffer limit of the replica is written. Example: 1. DB is empty 2. Backlog size is 1 MB 3. Client out put buffer limit is 2 MB 4. Client writes a 3 MB key 5. The shared replication buffer will have a single node which contains the key written above, and it exceeds the backlog size. At this point the client output buffer usage calculation will report the replica buffer to be 3 MB (or more) even after sending all the data to the replica. The primary drops the replica connection for exceeding the limits, the replica reconnects and successfully executes partial sync but the primary will drop the connection again because the buffer usage is still 3 MB. This happens over and over. To mitigate the problem, this fix limits the maximum size of a single backlog node to be (repl_backlog_size/16). This way a single node can't exceed the limits of the COB (the COB has to be larger than the backlog). It also means that if the backlog has some excessive data it can't trim, it would be at most about 6% overuse. other notes: 1. a loop was added in feedReplicationBuffer which caused a massive LOC change due to indentation, the actual changes are just the `min(max` and the loop. 3. an unrelated change in an existing test to speed up a server termination which took 10 seconds. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Fix the bug that CLIENT REPLY OFF\|SKIP cannot receive push notifications ↵	Binbin	2023-03-12	4	-3/+191
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#11875) This bug seems to be there forever, CLIENT REPLY OFF\|SKIP will mark the client with CLIENT_REPLY_OFF or CLIENT_REPLY_SKIP flags. With these flags, prepareClientToWrite called by addReply* will return C_ERR directly. So the client can't receive the Pub/Sub messages and any other push notifications, e.g client side tracking. In this PR, we adding a CLIENT_PUSHING flag, disables the reply silencing flags. When adding push replies, set the flag, after the reply, clear the flag. Then add the flag check in prepareClientToWrite. Fixes #11874 Note, the SUBSCRIBE command response is a bit awkward, see https://github.com/redis/redis-doc/pull/2327 Co-authored-by: Oran Agra <oran@redislabs.com>
*	Fix race in sentinel manual failover test (#11900)	Binbin	2023-03-12	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #9408, we added some SENTINEL DEBUG to reduce default timeouts and allow tests to execute faster. The change in 05-manual.tcl may cause a race that SENTINEL FAILOVER response with a NOGOODSLAVE: ``` Manual failover works: FAILED: Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) (Jumping to next unit after error) FAILED: caught an error in the test assertion:Expected NOGOODSLAVE No suitable replica to promote eq "OK" (context: type eval line 6 cmd {assert {$reply eq "OK"}} proc ::test) ``` The reason is that the info-period value was reduced in #9408 (the default value is 10000), and then manual failover was performed immediately, but the INFO may not exchanged between the sentinel and replicas, causing the sentinel to skip all the replicas in sentinelSelectSlave (Because replica's info_refresh is not updated, see the code snippet below), then return a NOGOODSLAVE, break the test. Code snippet from sentinelSelectSlave: ``` while((de = dictNext(di)) != NULL) { sentinelRedisInstance slave = dictGetVal(de); mstime_t info_validity_time; if (master->flags & SRI_S_DOWN) info_validity_time = sentinel_ping_period5; else info_validity_time = sentinel_info_period*3; if (mstime() - slave->info_refresh > info_validity_time) continue; } ``` By adding a wait_for_condition, we have the opportunity to let sentinel update the info_period of the replicas.
*	Add reply_schema to command json files (internal for now) (#10273)	guybe7	2023-03-11	33	-51/+455
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Work in progress towards implementing a reply schema as part of COMMAND DOCS, see #9845 Since ironing the details of the reply schema of each and every command can take a long time, we would like to merge this PR when the infrastructure is ready, and let this mature in the unstable branch. Meanwhile the changes of this PR are internal, they are part of the repo, but do not affect the produced build. ### Background In #9656 we add a lot of information about Redis commands, but we are missing information about the replies ### Motivation 1. Documentation. This is the primary goal. 2. It should be possible, based on the output of COMMAND, to be able to generate client code in typed languages. In order to do that, we need Redis to tell us, in detail, what each reply looks like. 3. We would like to build a fuzzer that verifies the reply structure (for now we use the existing testsuite, see the "Testing" section) ### Schema The idea is to supply some sort of schema for the various replies of each command. The schema will describe the conceptual structure of the reply (for generated clients), as defined in RESP3. Note that the reply structure itself may change, depending on the arguments (e.g. `XINFO STREAM`, with and without the `FULL` modifier) We decided to use the standard json-schema (see https://json-schema.org/) as the reply-schema. Example for `BZPOPMIN`: ``` "reply_schema": { "oneOf": [ { "description": "Timeout reached and no elements were popped.", "type": "null" }, { "description": "The keyname, popped member, and its score.", "type": "array", "minItems": 3, "maxItems": 3, "items": [ { "description": "Keyname", "type": "string" }, { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } ] } ``` #### Notes 1. It is ok that some commands' reply structure depends on the arguments and it's the caller's responsibility to know which is the relevant one. this comes after looking at other request-reply systems like OpenAPI, where the reply schema can also be oneOf and the caller is responsible to know which schema is the relevant one. 2. The reply schemas will describe RESP3 replies only. even though RESP3 is structured, we want to use reply schema for documentation (and possibly to create a fuzzer that validates the replies) 3. For documentation, the description field will include an explanation of the scenario in which the reply is sent, including any relation to arguments. for example, for `ZRANGE`'s two schemas we will need to state that one is with `WITHSCORES` and the other is without. 4. For documentation, there will be another optional field "notes" in which we will add a short description of the representation in RESP2, in case it's not trivial (RESP3's `ZRANGE`'s nested array vs. RESP2's flat array, for example) Given the above: 1. We can generate the "return" section of all commands in [redis-doc](https://redis.io/commands/) (given that "description" and "notes" are comprehensive enough) 2. We can generate a client in a strongly typed language (but the return type could be a conceptual `union` and the caller needs to know which schema is relevant). see the section below for RESP2 support. 3. We can create a fuzzer for RESP3. ### Limitations (because we are using the standard json-schema) The problem is that Redis' replies are more diverse than what the json format allows. This means that, when we convert the reply to a json (in order to validate the schema against it), we lose information (see the "Testing" section below). The other option would have been to extend the standard json-schema (and json format) to include stuff like sets, bulk-strings, error-string, etc. but that would mean also extending the schema-validator - and that seemed like too much work, so we decided to compromise. Examples: 1. We cannot tell the difference between an "array" and a "set" 2. We cannot tell the difference between simple-string and bulk-string 3. we cannot verify true uniqueness of items in commands like ZRANGE: json-schema doesn't cover the case of two identical members with different scores (e.g. `[["m1",6],["m1",7]]`) because `uniqueItems` compares (member,score) tuples and not just the member name. ### Testing This commit includes some changes inside Redis in order to verify the schemas (existing and future ones) are indeed correct (i.e. describe the actual response of Redis). To do that, we added a debugging feature to Redis that causes it to produce a log of all the commands it executed and their replies. For that, Redis needs to be compiled with `-DLOG_REQ_RES` and run with `--reg-res-logfile <file> --client-default-resp 3` (the testsuite already does that if you run it with `--log-req-res --force-resp3`) You should run the testsuite with the above args (and `--dont-clean`) in order to make Redis generate `.reqres` files (same dir as the `stdout` files) which contain request-response pairs. These files are later on processed by `./utils/req-res-log-validator.py` which does: 1. Goes over req-res files, generated by redis-servers, spawned by the testsuite (see logreqres.c) 2. For each request-response pair, it validates the response against the request's reply_schema (obtained from the extended COMMAND DOCS) 5. In order to get good coverage of the Redis commands, and all their different replies, we chose to use the existing redis test suite, rather than attempt to write a fuzzer. #### Notes about RESP2 1. We will not be able to use the testing tool to verify RESP2 replies (we are ok with that, it's time to accept RESP3 as the future RESP) 2. Since the majority of the test suite is using RESP2, and we want the server to reply with RESP3 so that we can validate it, we will need to know how to convert the actual reply to the one expected. - number and boolean are always strings in RESP2 so the conversion is easy - objects (maps) are always a flat array in RESP2 - others (nested array in RESP3's `ZRANGE` and others) will need some special per-command handling (so the client will not be totally auto-generated) Example for ZRANGE: ``` "reply_schema": { "anyOf": [ { "description": "A list of member elements", "type": "array", "uniqueItems": true, "items": { "type": "string" } }, { "description": "Members and their scores. Returned in case `WITHSCORES` was used.", "notes": "In RESP2 this is returned as a flat array", "type": "array", "uniqueItems": true, "items": { "type": "array", "minItems": 2, "maxItems": 2, "items": [ { "description": "Member", "type": "string" }, { "description": "Score", "type": "number" } ] } } ] } ``` ### Other changes 1. Some tests that behave differently depending on the RESP are now being tested for both RESP, regardless of the special log-req-res mode ("Pub/Sub PING" for example) 2. Update the history field of CLIENT LIST 3. Added basic tests for commands that were not covered at all by the testsuite ### TODO - [x] (maybe a different PR) add a "condition" field to anyOf/oneOf schemas that refers to args. e.g. when `SET` return NULL, the condition is `arguments.get\|\|arguments.condition`, for `OK` the condition is `!arguments.get`, and for `string` the condition is `arguments.get` - https://github.com/redis/redis/issues/11896 - [x] (maybe a different PR) also run `runtest-cluster` in the req-res logging mode - [x] add the new tests to GH actions (i.e. compile with `-DLOG_REQ_RES`, run the tests, and run the validator) - [x] (maybe a different PR) figure out a way to warn about (sub)schemas that are uncovered by the output of the tests - https://github.com/redis/redis/issues/11897 - [x] (probably a separate PR) add all missing schemas - [x] check why "SDOWN is triggered by misconfigured instance replying with errors" fails with --log-req-res - [x] move the response transformers to their own file (run both regular, cluster, and sentinel tests - need to fight with the tcl including mechanism a bit) - [x] issue: module API - https://github.com/redis/redis/issues/11898 - [x] (probably a separate PR): improve schemas: add `required` to `object`s - https://github.com/redis/redis/issues/11899 Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com> Co-authored-by: Hanna Fadida <hanna.fadida@redislabs.com> Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: Shaya Potter <shaya@redislabs.com>
*	Fix Uninitialised value error in createSparklineSequence (LATENCY GRAPH) ↵	Binbin	2023-03-09	1	-1/+19
\| \| \| \| \| \| \|	(#11892) This was exposed by a new LATENCY GRAPH valgrind test. There are no security implications, fix by initializing these members.
*	Fix test and improve assert_replication_stream print the whole stream (#11793)	Binbin	2023-03-08	3	-3/+21
\| \| \| \| \| \| \| \| \| \|	This PR has two parts: 1. Fix flaky test case, the previous tests set a lot of volatile keys, it injects an unexpected DEL command into the replication stream during the later test, causing it to fail. Add a flushall to avoid it. 2. Improve assert_replication_stream, now it can print the whole stream rather than just the failing line.
*	Fix an issue when module decides to unblock a client which is blocked on ↵	ranshid	2023-03-08	2	-3/+38
\| \| \| \| \| \| \| \| \| \| \| \|	keys (#11832) Currently (starting at #11012) When a module is blocked on keys it sets the CLIENT_PENDING_COMMAND flag. However in case the module decides to unblock the client not via the regular flow (eg timeout, key signal or CLIENT UNBLOCK command) it will attempt to reprocess the module command and potentially blocked again. This fix remove the CLIENT_PENDING_COMMAND flag in case blockedForKeys is issued from module context.
*	Solve race in CLIENT NO-TOUCH lru test (#11883)	Binbin	2023-03-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	I've seen it fail here (test-centos7-tls-module-no-tls and test-freebsd): ``` *** [err]: Operations in no-touch mode do not alter the last access time of a key in tests/unit/introspection-2.tcl Expected '244296' to be more than '244296' (context: type eval line 12 cmd {assert_morethan $newlru $oldlru} proc ::test) ``` Our LRU_CLOCK_RESOLUTION value is 1000ms, and default hz is 10, so if the test is really fast, or the timing is just right, newlru will be the same as oldlru. We fixed this by changing `after 1000` to `after 1100`.
*	Skip test for sdsRemoveFreeSpace when mem_allocator is not jemalloc (#11878)	sundb	2023-03-07	3	-1/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test `trim on SET with big value` (introduced from #11817) fails under mac m1 with libc mem_allocator. The reason is that malloc(33000) will allocate 65536 bytes(>42000). This test still passes under ubuntu with libc mem_allocator. ``` *** [err]: trim on SET with big value in tests/unit/type/string.tcl Expected [r memory usage key] < 42000 (context: type source line 471 file /Users/iospack/data/redis_fork/tests/unit/type/string.tcl cmd {assert {[r memory usage key] < 42000}} proc ::test) ``` simple test under mac m1 with libc mem_allocator: ```c void *p = zmalloc(33000); printf("malloc size: %zu\n", zmalloc_size(p)); # output malloc size: 65536 ```
*	Increase the threshold of the AOF loading defrag test (#11871)	Binbin	2023-03-04	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This test is very sensitive and fragile. It often fails in Daily, in most cases, it failed in test-ubuntu-32bit (the AOF loading one), with the range in (31, 40): ``` [err]: Active defrag in tests/unit/memefficiency.tcl Expected 38 <= 30 (context: type eval line 113 cmd {assert {$max_latency <= 30}} proc ::test) ``` The AOF loading part isn't tightly fixed to the cron hz. It calls processEventsWhileBlocked once in every 1024 command calls. ``` /* Serve the clients from time to time */ if (!(loops++ % 1024)) { off_t progress_delta = ftello(fp) - last_progress_report_size; loadingIncrProgress(progress_delta); last_progress_report_size += progress_delta; processEventsWhileBlocked(); processModuleLoadingProgressEvent(1); } ``` In this case, we can either decrease the 1024 or increase the threshold of just the AOF part of that test. Considering the test machines are sometimes slow, and all sort of quirks could happen (which do not indicate a bug), and we've already set to 30, we suppose we can set it a little bit higher, set it to 40. We can have this instead of adding another testing config (we can add it when we really need it). Fixes #11868
*	Try to trim strings only when applicable (#11817)	uriyage	2023-02-28	3	-1/+62
\| \| \| \| \| \| \| \| \| \|	As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at #11508). Only call the function when there is a possibility that the string contains free space. * For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG * For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN * For strings coming from modules, it could be anything. Co-authored-by: Oran Agra <oran@redislabs.com> Co-authored-by: sundb <sundbcn@gmail.com>
*	Integer Overflow in RAND commands can lead to assertion (CVE-2023-25155) ↵	Oran Agra	2023-02-28	3	-0/+9
\| \| \| \| \| \| \|	(#11857) Issue happens when passing a negative long value that greater than the max positive value that the long can store.
*	String pattern matching had exponential time complexity on pathological ↵	Oran Agra	2023-02-28	1	-0/+6
\| \| \| \| \| \| \| \| \|	patterns (CVE-2022-36021) (#11858) Authenticated users can use string matching commands with a specially crafted pattern to trigger a denial-of-service attack on Redis, causing it to hang and consume 100% CPU time. Co-authored-by: Tom Levy <tomlevy93@gmail.com>
*	Fix possible memory corruption in FLUSHALL when a client watches more than ↵	ranshid	2023-02-28	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	one key (#11854) Avoid calling unwatchAllKeys when running touchAllWatchedKeysInDb (which was unnecessary) This can potentially lead to use-after-free and memory corruption when the next entry pointer held by the watched keys iterator is freed when unwatching all keys of a specific client. found with address sanitizer, added a test which will not always fail (depending on the random dict hashing seed) problem introduced in #9829 (Reids 7.0) Co-authored-by: Oran Agra <oran@redislabs.com>
*	Try to solve valgrind CI test error with client-eviction test (#11822)	Oran Agra	2023-02-23	1	-0/+1
\| \| \| \| \| \|	The test sporadically failed with valgrind trying to match `no client named obuf-client1 found*` in the log it looks like `obuf-client1` was indeed dropped, so i'm guessing it's because CLIENT LIST was processed first.
*	Add CLIENT NO-TOUCH for clients to run commands without affecting LRU/LFU of ↵	Chen Tianjie	2023-02-23	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	keys (#11483) When no-touch mode is enabled, the client will not touch LRU/LFU of the keys it accesses, except when executing command `TOUCH`. This allows inspecting or modifying the key-space without affecting their eviction. Changes: - A command `CLIENT NO-TOUCH ON\|OFF` to switch on and off this mode. - A client flag `#define CLIENT_NOTOUCH (1ULL<<45)`, which can be shown with `CLIENT INFO`, by the letter "T" in the "flags" field. - Clear `NO-TOUCH` flag in `clearClientConnectionState`, which is used by `RESET` command and resetting temp clients used by modules. - Also clear `NO-EVICT` flag in `clearClientConnectionState`, this might have been an oversight, spotted by @madolson. - A test using `DEBUG OBJECT` command to verify that LRU stat is not touched when no-touch mode is on. Co-authored-by: chentianjie <chentianjie@alibaba-inc.com> Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com> Co-authored-by: sundb <sundbcn@gmail.com>
*	Speed up test: client evicted due to client tracking prefixes (#11823)	Binbin	2023-02-21	1	-51/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We noticed that `client evicted due to client tracking prefixes` takes over 200 seconds with valgrind. We combine three prefixes in each command, this will probably save us half the testing time. Before: normal: 3508ms, valgrind: 289503ms -> 290s With three prefixes, normal: 1500ms, valgrind: 135742ms -> 136s Since we did not actually count the memory usage of all prefixes, see getClientMemoryUsage, so we can not use larger prefixes to speed up the test here. Also this PR cleaned up some spaces (IDE jobs) and typos.
*	Prevent Redis from crashing from key tracking invalidations (#11814)	Madelyn Olson	2023-02-21	1	-0/+38
\| \| \|	There is a built in limit to client side tracking keys, which when exceeded will invalidate keys. This occurs in two places, one in the server cron and other before executing a command. If it happens in the second scenario, the invalidations will be queued for later since current client is set. This queue is never drained if a command is not executed (through call) such as a multi-exec command getting queued. This results in a later server assert crashing.
*	add test case and comments for active expiry in the writeable replica (#11789)	judeng	2023-02-20	1	-0/+17
\| \| \| \| \| \|	This test case is to cover a edge scenario: when a writable replica enabled AOF at the same time, active expiry keys which was created in writable replicas should propagate to the AOF file, and some versions might crash (fixed by #11615). For details, please refer to #11778
*	Cleanup around script_caller, fix tracking of scripts and ACL logging for ↵	Oran Agra	2023-02-16	6	-13/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RM_Call (#11770) * Make it clear that current_client is the root client that was called by external connection * add executing_client which is the client that runs the current command (can be a module or a script) * Remove script_caller that was used for commands that have CLIENT_SCRIPT to get the client that called the script. in most cases, that's the current_client, and in others (when being called from a module), it could be an intermediate client when we actually want the original one used by the external connection. bugfixes: * RM_Call with C flag should log ACL errors with the requested user rather than the one used by the original client, this also solves a crash when RM_Call is used with C flag from a detached thread safe context. * addACLLogEntry would have logged info about the script_caller, but in case the script was issued by a module command we actually want the current_client. the exception is when RM_Call is called from a timer event, in which case we don't have a current_client. behavior changes: * client side tracking for scripts now tracks the keys that are read by the script instead of the keys that are declared by the caller for EVAL other changes: * Log both current_client and executing_client in the crash log. * remove prepareLuaClient and resetLuaClient, being dead code that was forgotten. * remove scriptTimeSnapshot and snapshot_time and instead add cmd_time_snapshot that serves all commands and is reset only when execution nesting starts. * remove code to propagate CLIENT_FORCE_REPL from the executed command to the script caller since scripts aren't propagated anyway these days and anyway this flag wouldn't have had an effect since CLIENT_PREVENT_PROP is added by scriptResetRun. * fix a module GIL violation issue in afterSleep that was introduced in #10300 (unreleased)
*	Remove wrong code in list pot timeout test (#11805)	Binbin	2023-02-15	1	-1/+0
\| \| \| \| \| \| \| \| \|	In #9373, actually need to replace `$rd $pop blist1{t} blist2{t} 1` with `bpop_command_two_key $rd $pop blist1{t} blist2{t} 1` but forgot to delete the latter. This doesn't affect the test, because the later assert_error "WRONGTYPE" is expected (and right). And if we read $rd again, it will get the wrong result, like 'ERR unknown command 'BLMPOP_LEFT' \| 'BLMPOP_RIGHT'
*	Minor changes around the blockonkeys test module (#11803)	guybe7	2023-02-14	3	-149/+74
\| \| \| \| \| \| \| \| \| \| \| \|	All of the POP commands must not decr length below 0. So, get_fsl will delete the key if the length is 0 (unless the caller wished to create if doesn't exist) Other: 1. Use REDISMODULE_WRITE where needed (POP commands) 2. Use wait_for_blokced_clients in test Unrelated: Use quotes instead of curly braces in zset.tcl, for variable expansion
*	SCAN/RANDOMKEY and lazy-expire (#11788)	guybe7	2023-02-14	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Starting from Redis 7.0 (#9890) we started wrapping everything a command propagates with MULTI/EXEC. The problem is that both SCAN and RANDOMKEY can lazy-expire arbitrary keys (similar behavior to active-expire), and put DELs in a transaction. Fix: When these commands are called without a parent exec-unit (e.g. not in EVAL or MULTI) we avoid wrapping their DELs in a transaction (for the same reasons active-expire and eviction avoids a transaction) This PR adds a per-command flag that indicates that the command may touch arbitrary keys (not the ones in the arguments), and uses that flag to avoid the MULTI-EXEC. For now, this flag is internal, since we're considering other solutions for the future. Note for cluster mode: if SCAN/RANDOMKEY is inside EVAL/MULTI it can still cause the same situation (as it always did), but it won't cause a CROSSSLOT because replicas and AOF do not perform slot checks. The problem with the above is mainly for 3rd party ecosystem tools that propagate commands from master to master, or feed an AOF file with redis-cli into a master. This PR aims to fix the regression in redis 7.0, and we opened #11792 to try to handle the bigger problem with lazy expire better for another release.
*	Match REDISMODULE_OPEN_KEY_* flags to LOOKUP_* flags (#11772)	Meir Shpilraien (Spielrein)	2023-02-09	2	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \|	The PR adds support for the following flags on RedisModule_OpenKey: * REDISMODULE_OPEN_KEY_NONOTIFY - Don't trigger keyspace event on key misses. * REDISMODULE_OPEN_KEY_NOSTATS - Don't update keyspace hits/misses counters. * REDISMODULE_OPEN_KEY_NOEXPIRE - Avoid deleting lazy expired keys. * REDISMODULE_OPEN_KEY_NOEFFECTS - Avoid any effects from fetching the key In addition, added `RM_GetOpenKeyModesAll`, which returns the mask of all supported OpenKey modes. This allows the module to check, in runtime, which OpenKey modes are supported by the current Redis instance.
*	When DEBUG LOADAOF fails, return an error instead of exiting (#11790)	Binbin	2023-02-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Return an error when loadAppendOnlyFiles fails instead of exiting. DEBUF LOADAOF command is only meant to be used by the test suite, and only by tests that generated an AOF file first. So this change is ok (considering that the caller is likely to catch this error and die). This actually revert part of the code in #9012, and now DEBUG LOADAOF behaves the same as DEBUG RELOAD (returns an error when the load fails). Plus remove a `after 2000` in a test, which can save times (looks like copy paste error).
*	Optimize ZRANGE replies WITHSCORES in case of integer scores (#11779)	filipe oliveira	2023-02-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	If we have integer scores on the sorted set we're not using the fastest way to reply by calling `d2string` which uses `double2ll` and `ll2string` when it can, instead of `fpconv_dtoa`. This results by some 50% performance improvement in certain cases of integer scores for both RESP2 and RESP3, and no apparent impact on double scores. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Fix unstable test: replication with parallel clients writing in different ↵	Binbin	2023-02-03	1	-5/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DBs (#11782) Failure happens in FreeBSD daily: ``` *** [err]: Test replication with parallel clients writing in different DBs in tests/integration/replication-4.tcl Expected [::redis::redisHandle2 dbsize] > 0 (context: type eval line 19 cmd {assert {[$master dbsize] > 0}} proc ::test) ``` The test is failing because db 9 has no data (default db), and according to the log, we can see that db 9 does not have a key: ``` ### Starting test Test replication with parallel clients writing in different DBs in tests/integration/replication-4.tcl 3338:S 03 Feb 2023 00:15:18.723 - DB 11: 1 keys (0 volatile) in 4 slots HT. 3338:S 03 Feb 2023 00:15:18.723 - DB 12: 141 keys (0 volatile) in 256 slots HT. ``` We use `wait_for_condition` to ensure that parallel clients have written data before calling stop_bg_complex_data. At the same time, `wait_for_condition` is also used to remove the above `after 1000`, which can save time in most cases.
*	Fix timing issue in new ACL log test (#11781)	Binbin	2023-02-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a timing issue in the new ACL log test: ``` *** [err]: ACL LOG aggregates similar errors together and assigns unique entry-id to new errors in tests/unit/acl.tcl Expected 1675382873989 < 1675382873989 (context: type eval line 15 cmd {assert {$timestamp_last_update_original < $timestamp_last_updated_after_update}} proc ::test) ``` Looking at the test code, we will check the `timestamp-last-updated` before and after a new ACL error occurs. Actually `WRONGPASS` errors can be executed very quickly on fast machines. For example, in the this case, the execution is completed within one millisecond. The error is easy to reproduce, if we reduce the number of the for loops, for example set to 2, and using --loop and --stop. Avoid this timing issue by adding an `after 1` before the new errors. The test was introduced in #11477.
*	Added fields to ACL LOG error entries for precise time logging (#11477)	Roshan Khatri	2023-02-02	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added 3 fields to the ACL LOG - adds entry_id, timestamp_created and timestamp_last_updated, which updates similar existing log error entries. The pair - entry_id, timestamp_created is a unique identifier of this entry, in case the node dies and is restarted, it can detect that if it's a new series. The primary use case of Unique id is to uniquely identify the error messages and not to detect if the server has restarted. entry-id is the sequence number of the entry (starting at 0) since the server process started. Can also be used to check if items were "lost" if they fell between periods. timestamp-created is the unix-time in ms at the time the entry was first created. timestamp-last-updated is the unix-time in ms at the time the entry was last updated Time_created gives the absolute time which better accounts for network time as compared to time since. It can also be older than 60 secs and presently there is no field that can display the original time of creation once the error entry is updated. The reason of timestamp_last_updated field is that it provides a more precise value for the “last time” an error was seen where as, presently it is only in the 60 second period. Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
*	Propagate message to a node only if the cluster link is healthy. (#11752)	Harkrishn Patro	2023-02-02	1	-1/+55
\| \| \| \| \| \| \| \|	Currently while a sharded pubsub message publish tries to propagate the message across the cluster, a NULL check is missing for clusterLink. clusterLink could be NULL if the link is causing memory beyond the set threshold cluster-link-sendbuf-limit and server terminates the link. This change introduces two things: Avoids the engine crashes on the publishing node if a message is tried to be sent to a node and the link is NULL. Adds a debugging tool CLUSTERLINK KILL to terminate the clusterLink between two nodes.
*	Fix handshake timeout replication test race (#11773)	Binbin	2023-02-01	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test on x86 + TLS fail with this error: ``` *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl Replica is not able to detect timeout ``` The replica logs is: ``` ### Starting test Slave is able to detect timeout during handshake in tests/integration/replication.tcl 7681:S 05 Jan 2023 00:21:56.635 * Non blocking connect for SYNC fired the event. 7681:S 05 Jan 2023 00:21:56.638 * Master replied to PING, replication can continue... 7681:S 05 Jan 2023 00:21:56.638 * Trying a partial resynchronization (request ef70638885500aad12dd673c68ca1541116a59fe:1). 7681:S 05 Jan 2023 00:22:56.894 # Failed to read response from the server: error:0A000126:SSL routines::unexpected eof while reading 7681:S 05 Jan 2023 00:22:56.894 # Master did not reply to PSYNC, will try later ``` This is another issue that appeared after #11640 was merged. This PR try to fix it. The idea is to make it stable in `wait_bgsave`, for example, it may wait until the next psync retry in the following situation: `Master did not reply to PSYNC, will try later` Other than that, the change will make the test more consistent / predictable since it'll mean the master is always frozen in the desired state (waiting for repl-diskless-sync-delay to happen, rather than earlier stages of the handshake).
*	Optimization: sdsRemoveFreeSpace to avoid realloc on noop (#11766)	uriyage	2023-01-31	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In #7875 (Redis 6.2), we changed the sds alloc to be the usable allocation size in order to: > reduce the need for realloc calls by making the sds implicitly take over the internal fragmentation This change was done most sds functions, excluding `sdsRemoveFreeSpace` and `sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer) we call sdsRemoveFreeSpace when we see excessive free space and want to trim it. so if we don't trim it exactly to size, the caller may still see excessive free space and call it again and again. However, this resulted in some excessive calls to realloc, even when there's no need and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13). It turns out that a call for realloc with jemalloc can be expensive even if it ends up doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid the call for realloc. in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common code. the difference between them was that sdsResize would avoid using SDS_TYPE_5, since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace would permit using SDS_TYPE_5 and get an optimal memory consumption. now both methods take a `would_regrow` argument that makes it more explicit. the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both. The new test that was added to cover this concern used to pass before this PR as well, this PR is just a performance optimization and cleanup. Benchmark: `redis-benchmark -c 100 -t set -d 512 -P 10 -n 100000000` on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs). some 4.5% improvement. Co-authored-by: Oran Agra <oran@redislabs.com>
*	Obuf limit, exit during loop in RAND commands and KEYS (#11676)	Oran Agra	2023-01-16	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Related to the hang reported in #11671 Currently, redis can disconnect a client due to reaching output buffer limit, it'll also avoid feeding that output buffer with more data, but it will keep running the loop in the command (despite the client already being marked for disconnection) This PR is an attempt to mitigate the problem, specifically for commands that are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER. The RAND family of commands can take a negative COUNT argument (which is not bound to the number of elements in the key), so it's enough to create a key with one field, and then these commands can be used to hang redis. For KEYS the caller can use the existing keyspace in redis (if big enough).
*	Fix range issues in ZRANDMEMBER and HRANDFIELD (CVE-2023-22458) (#11674)	Oran Agra	2023-01-16	2	-0/+10
\| \| \| \|	missing range check in ZRANDMEMBER and HRANDIFLD leading to panic due to protocol limitations
*	Avoid integer overflows in SETRANGE and SORT (CVE-2022-35977) (#11720)	Oran Agra	2023-01-16	2	-0/+21
\| \| \| \| \|	Authenticated users issuing specially crafted SETRANGE and SORT(_RO) commands can trigger an integer overflow, resulting with Redis attempting to allocate impossible amounts of memory and abort with an OOM panic.
*	Blocking command with a 0.001 seconds timeout blocks indefinitely (#11688)	Gabi Ganam	2023-01-08	1	-0/+10
\| \| \|	Any value in the range of [0-1) turns to 0 when being cast from double to long long. This change rounds up instead of down for values that can't be stored precisely as long doubles.
*	Fix potential issue with Lua argv caching, module command filter and libc ↵	Oran Agra	2023-01-04	2	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	realloc (#11652) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.
*	fix handshake timeout replication test race (#11640)	Oran Agra	2023-01-04	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test on ARM + TLS often fail with this error: ``` *** [err]: Slave is able to detect timeout during handshake in tests/integration/replication.tcl Replica is not able to detect timeout ``` https://github.com/redis/redis-extra-ci/actions/runs/3727554226/jobs/6321797837 The replica logs show that in this case the replica got timeout before even getting a response to the PING command (instead of the SYNC command). it should have shown these: ``` * MASTER <-> REPLICA sync started * REPLICAOF 127.0.0.1:22112 enabled .... ### Starting test Slave enters handshake in tests/integration/replication.tcl * Non blocking connect for SYNC fired the event. ``` then: ``` * Master replied to PING, replication can continue... * Trying a partial resynchronization (request 50da9eff70d774f4e6cb723eb4b091440f215772:1). ``` and then hang for 5 seconds: ``` # Timeout connecting to the MASTER... * Reconnecting to MASTER 127.0.0.1:21112 after failure ``` but instead it got this (looks like it disconnected too early, and then tried to re-connect): ``` 10890:M 19 Dec 2022 01:32:54.794 * Ready to accept connections tls 10890:M 19 Dec 2022 01:32:54.809 - Accepted 127.0.0.1:41047 10890:M 19 Dec 2022 01:32:54.878 - Reading from client: error:0A000126:SSL routines::unexpected eof while reading 10890:M 19 Dec 2022 01:32:54.925 - Accepted 127.0.0.1:39207 10890:S 19 Dec 2022 01:32:55.463 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 10890:S 19 Dec 2022 01:32:55.463 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:32:55.463 * MASTER <-> REPLICA sync started 10890:S 19 Dec 2022 01:32:55.463 * REPLICAOF 127.0.0.1:24126 enabled (user request from 'id=4 addr=127.0.0.1:39207 laddr=127.0.0.1:24125 fd=8 name= age=1 idle=0 flags=N db=9 sub=0 psub=0 ssub=0 multi=-1 qbuf=43 qbuf-free=20431 argv-mem=21 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=22317 events=r cmd=slaveof user=default redir=-1 resp=2') ### Starting test Slave enters handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:32:55.476 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:00.701 # Failed to read response from the server: (null) <- note this!! 10890:S 19 Dec 2022 01:33:00.701 # Master did not respond to command during SYNC handshake 10890:S 19 Dec 2022 01:33:01.002 * Connecting to MASTER 127.0.0.1:24126 10890:S 19 Dec 2022 01:33:01.002 * MASTER <-> REPLICA sync started ### Starting test Slave is able to detect timeout during handshake in tests/integration/replication.tcl 10890:S 19 Dec 2022 01:33:05.497 * Non blocking connect for SYNC fired the event. 10890:S 19 Dec 2022 01:33:05.500 * Master replied to PING, replication can continue... 10890:S 19 Dec 2022 01:33:05.510 * Trying a partial resynchronization (request 947e1956372a0e6c819cfec51c42cc7979b0c221:1). 10890:S 19 Dec 2022 01:34:05.833 # Failed to read response from the server: error:0A000126:SSL routines::unexpected eof while reading 10890:S 19 Dec 2022 01:34:05.833 # Master did not reply to PSYNC, will try later ``` This PR sets enables the 5 seconds timeout at a later stage to try and prevent the early disconnection.
*	reprocess command when client is unblocked on keys (#11012)	ranshid	2023-01-01	7	-12/+313
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TL;DR --------------------------------------- Following the discussion over the issue [#7551](https://github.com/redis/redis/issues/7551) We decided to refactor the client blocking code to eliminate some of the code duplications and to rebuild the infrastructure better for future key blocking cases. In this PR --------------------------------------- 1. reprocess the command once a client becomes unblocked on key (instead of running custom code for the unblocked path that's different than the one that would have run if blocking wasn't needed) 2. eliminate some (now) irrelevant code for handling unblocking lists/zsets/streams etc... 3. modify some tests to intercept the error in cases of error on reprocess after unblock (see details in the notes section below) 4. replace '$' on the client argv with current stream id. Since once we reprocess the stream XREAD we need to read from the last msg and not wait for new msg in order to prevent endless block loop. 5. Added statistics to the info "Clients" section to report the: * `total_blocking_keys` - number of blocking keys * `total_blocking_keys_on_nokey` - number of blocking keys which have at least 1 client which would like to be unblocked on when the key is deleted. 6. Avoid expiring unblocked key during unblock. Previously we used to lookup the unblocked key which might have been expired during the lookup. Now we lookup the key using NOTOUCH and NOEXPIRE to avoid deleting it at this point, so propagating commands in blocked.c is no longer needed. 7. deprecated command flags. We decided to remove the CMD_CALL_STATS and CMD_CALL_SLOWLOG and make an explicit verification in the call() function in order to decide if stats update should take place. This should simplify the logic and also mitigate existing issues: for example module calls which are triggered as part of AOF loading might still report stats even though they are called during AOF loading. Behavior changes --------------------------------------------------- 1. As this implementation prevents writing dedicated code handling unblocked streams/lists/zsets, since we now re-process the command once the client is unblocked some errors will be reported differently. The old implementation used to issue ``UNBLOCKED the stream key no longer exists`` in the following cases: - The stream key has been deleted (ie. calling DEL) - The stream and group existed but the key type was changed by overriding it (ie. with set command) - The key not longer exists after we swapdb with a db which does not contains this key - After swapdb when the new db has this key but with different type. In the new implementation the reported errors will be the same as if the command was processed after effect: NOGROUP - in case key no longer exists, or WRONGTYPE in case the key was overridden with a different type. 2. Reprocessing the command means that some checks will be reevaluated once the client is unblocked. For example, ACL rules might change since the command originally was executed and will fail once the client is unblocked. Another example is OOM condition checks which might enable the command to run and block but fail the command reprocess once the client is unblocked. 3. One of the changes in this PR is that no command stats are being updated once the command is blocked (all stats will be updated once the client is unblocked). This implies that when we have many clients blocked, users will no longer be able to get that information from the command stats. However the information can still be gathered from the client list. Client blocking --------------------------------------------------- the blocking on key will still be triggered the same way as it is done today. in order to block the current client on list of keys, the call to blockForKeys will still need to be made which will perform the same as it is today: * add the client to the list of blocked clients on each key * keep the key with a matching list node (position in the global blocking clients list for that key) in the client private blocking key dict. * flag the client with CLIENT_BLOCKED * update blocking statistics * register the client on the timeout table Key Unblock --------------------------------------------------- Unblocking a specific key will be triggered (same as today) by calling signalKeyAsReady. the implementation in that part will stay the same as today - adding the key to the global readyList. The reason to maintain the readyList (as apposed to iterating over all clients blocked on the specific key) is in order to keep the signal operation as short as possible, since it is called during the command processing. The main change is that instead of going through a dedicated code path that operates the blocked command we will just call processPendingCommandsAndResetClient. ClientUnblock (keys) --------------------------------------------------- 1. Unblocking clients on keys will be triggered after command is processed and during the beforeSleep 8. the general schema is: 9. For each key k in the readyList: ``` For each client c which is blocked on k: in case either: 1. k exists AND the k type matches the current client blocking type OR 2. k exists and c is blocked on module command OR 3. k does not exists and c was blocked with the flag unblock_on_deleted_key do: 1. remove the client from the list of clients blocked on this key 2. remove the blocking list node from the client blocking key dict 3. remove the client from the timeout list 10. queue the client on the unblocked_clients list 11. NEW: call processCommandAndResetClient(c); ``` NOTE: for module blocked clients we will still call the moduleUnblockClientByHandle which will queue the client for processing in moduleUnblockedClients list. Process Unblocked clients --------------------------------------------------- The process of all unblocked clients is done in the beforeSleep and no change is planned in that part. The general schema will be: For each client c in server.unblocked_clients: * remove client from the server.unblocked_clients * set back the client readHandler * continue processing the pending command and input buffer. Some notes regarding the new implementation --------------------------------------------------- 1. Although it was proposed, it is currently difficult to remove the read handler from the client while it is blocked. The reason is that a blocked client should be unblocked when it is disconnected, or we might consume data into void. 2. While this PR mainly keep the current blocking logic as-is, there might be some future additions to the infrastructure that we would like to have: - allow non-preemptive blocking of client - sometimes we can think that a new kind of blocking can be expected to not be preempt. for example lets imagine we hold some keys on disk and when a command needs to process them it will block until the keys are uploaded. in this case we will want the client to not disconnect or be unblocked until the process is completed (remove the client read handler, prevent client timeout, disable unblock via debug command etc...). - allow generic blocking based on command declared keys - we might want to add a hook before command processing to check if any of the declared keys require the command to block. this way it would be easier to add new kinds of key-based blocking mechanisms. Co-authored-by: Oran Agra <oran@redislabs.com> Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
*	Harden init-tests for cluster tests (#11635)	Madelyn Olson	2022-12-22	2	-6/+25
\| \| \| \| \|	Attempt to harden cluster init-tests by doing two things: * Retry up to 3 times to join the cluster. Cluster meet is entirely idempotent, so it should stabilize if we missed a node. * Validate the connection is actually established, not just exists in the cluster list. Nodes can exist in handshake, but might later get dropped.
*	Fix race in PSYNC2 partial resync test (#11653)	Binbin	2022-12-22	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	This test sometimes fails: ``` *** [err]: PSYNC2: Partial resync after Master restart using RDB aux fields with expire in tests/integration/psync2-master-restart.tcl Expected [status ::redis::redisHandle24 sync_partial_ok] == 1 (context: type eval line 49 cmd {assert {[status $replica sync_partial_ok] == 1}} proc ::test) ``` This is because the default repl-timeout value is 10s, sometimes the test got timeout, then it will do a reconnect, it will incr the sync_partial_ok counter, and then cause the test to fail.In this fix, we set the repl-timeout to a very large number to make sure we won't get the timeout.
*	Fix flaky PTTL time to live in milliseconds test on slow machines (#11651)	Binbin	2022-12-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This test failed in FreeBSD: ``` *** [err]: PTTL returns time to live in milliseconds in tests/unit/expire.tcl Expected 836 > 900 && 836 <= 1000 (context: type eval line 5 cmd {assert {$ttl > 900 && $ttl <= 1000}} proc ::test) ``` On some slow machines, sometimes the test take close to 200ms to finish. We only set aside 100ms, so that caused the failure. Since the failure was around 800, change the condition to be >500.
*	Cleanup: Get rid of server.core_propagates (#11572)	guybe7	2022-12-20	2	-3/+28
\| \| \| \| \| \| \| \|	1. Get rid of server.core_propagates - we can just rely on module/call nesting levels 2. Rename in_nested_call to execution_nesting and update the comment 3. Remove module_ctx_nesting (redundant, we can use execution_nesting) 4. Modify postExecutionUnitOperations according to the comment (The main purpose of this PR) 5. trackingHandlePendingKeyInvalidations: Check the nesting level inside this function
*	fix race in list test with blocking commands (#11627)	Oran Agra	2022-12-18	2	-12/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've seen the `BRPOPLPUSH with multiple blocked clients` test hang. this probably happened because rd2 blocked before rd1 and then it was also released first, and rd1 remained blocked. ``` r del blist{t} target1{t} target2{t} r set target1{t} nolist $rd1 brpoplpush blist{t} target1{t} 0 $rd2 brpoplpush blist{t} target2{t} 0 r lpush blist{t} foo assert_error "WRONGTYPE" {$rd1 read} assert_equal {foo} [$rd2 read] assert_equal {foo} [r lrange target2{t} 0 -1] ``` changes: added all missing calls for wait_for_blocked_client after issuing blocking commands) * removed some excessive `after 100` * fix undetected crossslot error in BRPOPLPUSH test * rollback changes to proto-max-bulk-len so external tests can be rerun
*	fix flaky latency test (#11636)	Oran Agra	2022-12-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a flaky test that probably fails on overload timing issues. This unit starts with ``` # Set a threshold high enough to avoid spurious latency events. r config set latency-monitor-threshold 200 ``` but later the test measuring expire event changes the threshold. this fix is to revert it to 200 after that test. Got this error (ARM+TLS) ``` *** [err]: LATENCY RESET is able to reset events in tests/unit/latency-monitor.tcl Expected [r latency latest] eq {} (context: type eval line 3 cmd {assert {[r latency latest] eq {}}} proc ::test) ```
*	Fixed small distance replies on GEODIST and GEO commands WITHDIST (#11631)	filipe oliveira	2022-12-15	1	-0/+7
\| \| \| \| \| \|	Fixes a regression introduced by #11552 in 7.0.6. it causes replies in the GEO commands to contain garbage when the result is a very small distance (less than 1) Includes test to confirm indeed with junk in buffer now we properly reply
*	Fix races in swapdb async_loading test (#11613)	Binbin	2022-12-13	1	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race in the test: ``` *** [err]: Diskless load swapdb (async_loading): new database is exposed after swapping in tests/integration/replication.tcl Expected 'myvalue' to be equal to '' (context: type eval line 3 cmd {assert_equal [$replica GET mykey] ""} proc ::test) ``` When doing `$replica GET mykey`, the replica is using the old database. The reason may be that when doing `master client kill type replica`, the replica did not yet realize it got disconnected from the master. So the check of master_link_status fails, and the replica did not finish the swapdb and the loading. In that case, i think the solution is to check the sync_full stat on the master and wait for it to get incremented from the previous value. i.e. the way to know that we're done with the full sync is not to check that our state is up (could be up if we check too early), but rather check that the sync_full counter got incremented. During the reviewing, we found another race, in Aborted testType, the `$master config set rdb-key-save-delay 10000` is done after we already initiated the disconnection, so there's a chance that the replica will attempt to reconnect before that call, in which case if we fork() before it, the config will not take effect. Move it to above the disconnection. Co-authored-by: Oran Agra <oran@redislabs.com>