summaryrefslogtreecommitdiff
path: root/src/db.c
diff options
context:
space:
mode:
authorMeir Shpilraien (Spielrein) <meir@redis.com>2022-08-24 12:51:36 +0300
committerGitHub <noreply@github.com>2022-08-24 12:51:36 +0300
commitc1bd61a4a50225982d4a9bc7c66fec913b0c9af2 (patch)
tree7dbf99f4377f69ccd1ded7b82f8a889305a76674 /src/db.c
parent41d9eb0291417c36d694894c936d8f4f29ec5504 (diff)
downloadredis-c1bd61a4a50225982d4a9bc7c66fec913b0c9af2.tar.gz
Reverts most of the changes of #10969 (#11178)
The PR reverts the changes made on #10969. The reason for revert was trigger because of occasional test failure that started after the PR was merged. The issue is that if there is a lazy expire during the command invocation, the `del` command is added to the replication stream after the command placeholder. So the logical order on the primary is: * Delete the key (lazy expiration) * Command invocation But the replication stream gets it the other way around: * Command invocation (because the command is written into the placeholder) * Delete the key (lazy expiration) So if the command write to the key that was just lazy expired we will get inconsistency between primary and replica. One solution we considered is to add another lazy expire replication stream and write all the lazy expire there. Then when replicating, we will replicate the lazy expire replication stream first. This will solve this specific test failure but we realize that the issues does not ends here and the more we dig the more problems we find.One of the example we thought about (that can actually crashes Redis) is as follow: * User perform SINTERSTORE * When Redis tries to fetch the second input key it triggers lazy expire * The lazy expire trigger a module logic that deletes the first input key * Now Redis hold the robj of the first input key that was actually freed We believe we took the wrong approach and we will come up with another PR that solve the problem differently, for now we revert the changes so we will not have the tests failure. Notice that not the entire code was revert, some parts of the PR are changes that we would like to keep. The changes that **was** reverted are: * Saving a placeholder for replication at the beginning of the command (`call` function) * Order of the replication stream on active expire and eviction (we will decide how to handle it correctly on follow up PR) * `Spop` changes are no longer needed (because we reverted the placeholder code) Changes that **was not** reverted: * On expire/eviction, wrap the `del` and the notification effect in a multi exec. * `PropagateNow` function can still accept a special dbid, -1, indicating not to replicate select. * Keep optimisation for reusing the `alsoPropagate` array instead of allocating it each time. Tests: * All tests was kept and only few tests was modify to work correctly with the changes * Test was added to verify that the revert fixes the issues.
Diffstat (limited to 'src/db.c')
-rw-r--r--src/db.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/src/db.c b/src/db.c
index 04ab455fd..36de2aa0a 100644
--- a/src/db.c
+++ b/src/db.c
@@ -1557,9 +1557,9 @@ void deleteExpiredKeyAndPropagate(redisDb *db, robj *keyobj) {
dbSyncDelete(db,keyobj);
latencyEndMonitor(expire_latency);
latencyAddSampleIfNeeded("expire-del",expire_latency);
+ notifyKeyspaceEvent(NOTIFY_EXPIRED,"expired",keyobj,db->id);
signalModifiedKey(NULL, db, keyobj);
propagateDeletion(db,keyobj,server.lazyfree_lazy_expire);
- notifyKeyspaceEvent(NOTIFY_EXPIRED,"expired",keyobj,db->id);
server.stat_expiredkeys++;
}