summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* ae.c: insetad of not firing, on AE_BARRIER invert the sequence.fsync-safetyantirez2018-02-271-22/+38
| | | | | | | | | | | | AE_BARRIER was implemented like: - Fire the readable event. - Do not fire the writabel event if the readable fired. However this may lead to the writable event to never be called if the readable event is always fired. There is an alterantive, we can just invert the sequence of the calls in case AE_BARRIER is set. This commit does that.
* AOF: fix a bug that may prevent proper fsyncing when fsync=always.antirez2018-02-271-6/+18
| | | | | | | | | | | In case the write handler is already installed, it could happen that we serve the reply of a query in the same event loop cycle we received it, preventing beforeSleep() from guaranteeing that we do the AOF fsync before sending the reply to the client. The AE_BARRIER mechanism, introduced in a previous commit, prevents this problem. This commit makes actual use of this new feature to fix the bug.
* Cluster: improve crash-recovery safety after failover auth vote.antirez2018-02-271-2/+3
| | | | | | | | Add AE_BARRIER to the writable event loop so that slaves requesting votes can't be served before we re-enter the event loop in the next iteration, so clusterBeforeSleep() will fsync to disk in time. Also add the call to explicitly fsync, given that we modified the last vote epoch variable.
* ae.c: introduce the concept of read->write barrier.antirez2018-02-232-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AOF fsync=always, and certain Redis Cluster bus operations, require to fsync data on disk before replying with an acknowledge. In such case, in order to implement Group Commits, we want to be sure that queries that are read in a given cycle of the event loop, are never served to clients in the same event loop iteration. This way, by using the event loop "before sleep" callback, we can fsync the information just one time before returning into the event loop for the next cycle. This is much more efficient compared to calling fsync() multiple times. Unfortunately because of a bug, this was not always guaranteed: the actual way the events are installed was the sole thing that could control. Normally this problem is hard to trigger when AOF is enabled with fsync=always, because we try to flush the output buffers to the socekt directly in the beforeSleep() function of Redis. However if the output buffers are full, we actually install a write event, and in such a case, this bug could happen. This change to ae.c modifies the event loop implementation to make this concept explicit. Write events that are registered with: AE_WRITABLE|AE_BARRIER Are guaranteed to never fire after the readable event was fired for the same file descriptor. In this way we are sure that data is persisted to disk before the client performing the operation receives an acknowledged. However note that this semantics does not provide all the guarantees that one may believe are automatically provided. Take the example of the blocking list operations in Redis. With AOF and fsync=always we could have: Client A doing: BLPOP myqueue 0 Client B doing: RPUSH myqueue a b c In this scenario, Client A will get the "a" elements immediately after the Client B RPUSH will be executed, even before the operation is persisted. However when Client B will get the acknowledge, it can be sure that "b,c" are already safe on disk inside the list. What to note here is that it cannot be assumed that Client A receiving the element is a guaranteed that the operation succeeded from the point of view of Client B. This is due to the fact that the barrier exists within the same socket, and not between different sockets. However in the case above, the element "a" was not going to be persisted regardless, so it is a pretty synthetic argument.
* Fix ziplist prevlen encoding description. See #4705.antirez2018-02-231-6/+6
|
* Track number of logically expired keys still in memory.antirez2018-02-193-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two new fields in the INFO output, stats section: expired_stale_perc:0.34 expired_time_cap_reached_count:58 The first field is an estimate of the number of keys that are yet in memory but are already logically expired. They reason why those keys are yet not reclaimed is because the active expire cycle can't spend more time on the process of reclaiming the keys, and at the same time nobody is accessing such keys. However as the active expire cycle runs, while it will eventually have to return to the caller, because of time limit or because there are less than 25% of keys logically expired in each given database, it collects the stats in order to populate this INFO field. Note that expired_stale_perc is a running average, where the current sample accounts for 5% and the history for 95%, so you'll see it changing smoothly over time. The other field, expired_time_cap_reached_count, counts the number of times the expire cycle had to stop, even if still it was finding a sizeable number of keys yet to expire, because of the time limit. This allows people handling operations to understand if the Redis server, during mass-expiration events, is able to collect keys fast enough usually. It is normal for this field to increment during mass expires, but normally it should very rarely increment. When instead it constantly increments, it means that the current workloads is using a very important percentage of CPU time to expire keys. This feature was created thanks to the hints of Rashmi Ramesh and Bart Robinson from Twitter. In private email exchanges, they noted how it was important to improve the observability of this parameter in the Redis server. Actually in big deployments, the amount of keys that are yet to expire in each server, even if they are logically expired, may account for a very big amount of wasted memory.
* Remove non semantical spaces from module.c.antirez2018-02-151-41/+36
|
* Merge pull request #4479 from dvirsky/notifySalvatore Sanfilippo2018-02-155-36/+302
|\ | | | | Keyspace notifications API for modules
| * Add doc comment about notification flagsDvir Volk2018-02-141-0/+1
| |
| * Add REDISMODULE_NOTIFY_STREAM flag to support stream notificationsDvir Volk2018-02-141-1/+2
| |
| * Fix indentation and comment style in testmoduleDvir Volk2018-02-141-98/+92
| |
| * Use one static client for all keyspace notification callbacksDvir Volk2018-02-141-7/+11
| |
| * Remove the NOTIFY_MODULE flag and simplify the module notification flow if ↵Dvir Volk2018-02-143-9/+5
| | | | | | | | there aren't subscribers
| * Document flags for notificationsDvir Volk2018-02-141-1/+17
| |
| * removed some trailing whitespacesDvir Volk2018-02-141-2/+0
| |
| * removed hellonotify.cDvir Volk2018-02-143-87/+1
| |
| * fixed testDvir Volk2018-02-141-1/+7
| |
| * finished implementation of notifications. Tests unfinishedDvir Volk2018-02-147-2/+338
| |
* | Fix typo in notifyKeyspaceEvent() comment.antirez2018-02-151-1/+1
|/
* Merge pull request #4685 from charsyam/refactoring/set_max_latencySalvatore Sanfilippo2018-02-131-2/+2
|\ | | | | Removing duplicated code to set max latency
| * getting rid of duplicated codecharsyam2018-02-141-2/+2
|/
* More verbose logging when slave sends errors to master.antirez2018-02-131-2/+6
| | | | See #3832.
* Merge pull request #3832 from oranagra/slave_reply_to_master_prSalvatore Sanfilippo2018-02-131-0/+2
|\ | | | | when a slave responds with an error on commands that come from master, log it
| * when a slave experiances an error on commands that come from master, print ↵oranagra2017-02-231-0/+2
| | | | | | | | | | | | | | | | to the log since slave isn't replying to it's master, these errors go unnoticed. since we don't expect the master to send garbadge to the slave, this should be safe. (as long as we don't log OOM errors there)
* | Merge pull request #3745 from guybe7/unstableSalvatore Sanfilippo2018-02-133-2/+7
|\ \ | | | | | | enlarged buffer given to ld2string
| * \ Merge branch 'unstable' of https://github.com/antirez/redis into unstableGuy Benoish2017-05-09178-17301/+8765
| |\ \
| * \ \ Merge branch 'unstable' of https://github.com/antirez/redis into unstableGuy Benoish2017-03-02182-6523/+17985
| |\ \ \
| * | | | enlarged buffer given to ld2stringGuy Benoish2017-01-113-2/+7
| | | | |
* | | | | Make it explicit with a comment why we kill the old AOF rewrite.antirez2018-02-131-0/+3
| | | | | | | | | | | | | | | | | | | | See #3858.
* | | | | rewriteAppendOnlyFileBackground() failure fixGuy Benoish2018-02-131-21/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to do BGREWRITEAOF even if appendonly=no. This is by design. stopAppendonly() didn't turn off aof_rewrite_scheduled (it can be turned on again by BGREWRITEAOF even while appendonly is off anyway). After configuring `appendonly yes` it will see that the state is AOF_OFF, there's no RDB fork, so it will do rewriteAppendOnlyFileBackground() which will fail since the aof_child_pid is set (was scheduled and started by cron). Solution: stopAppendonly() will turn off the schedule flag (regardless of who asked for it). startAppendonly() will terminate any existing fork and start a new one (so it is the most recent).
* | | | | Merge pull request #4684 from oranagra/latency_monitor_maxSalvatore Sanfilippo2018-02-131-0/+1
|\ \ \ \ \ | | | | | | | | | | | | fix to latency monitor reporting wrong max latency
| * | | | | fix to latency monitor reporting wrong max latencyOran Agra2018-02-131-0/+1
|/ / / / / | | | | | | | | | | | | | | | | | | | | in some cases LATENCY HISTORY reported latency that was higher than the max latency reported by LATENCY LATEST / DOCTOR
* | | | | Rax updated to latest antirez/rax commit.antirez2018-02-021-2/+2
| | | | |
* | | | | Merge pull request #4269 from jianqingdu/unstableSalvatore Sanfilippo2018-01-241-2/+2
|\ \ \ \ \ | | | | | | | | | | | | fix not call va_end() when syncWrite() failed
| * | | | | fix not call va_end when syncWrite() failedjianqingdu2017-08-301-2/+2
| | | | | | | | | | | | | | | | | | fix not call va_end when syncWrite() failed in sendSynchronousCommand()
* | | | | | Merge pull request #4628 from mnunberg/patch-1Salvatore Sanfilippo2018-01-241-1/+1
|\ \ \ \ \ \ | | | | | | | | | | | | | | redismodule.h: Check ModuleNameBusy before calling it
| * | | | | | redismodule.h: Check ModuleNameBusy before calling itMark Nunberg2018-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | Older versions might not have this function.
* | | | | | | Fix integration test NOREPLICAS error time dependent false positive.antirez2018-01-241-3/+6
|/ / / / / /
* | | | | | Fix migrateCommand() access of not initialized byte.antirez2018-01-181-2/+5
| | | | | |
* | | | | | Replication buffer fills up on high rate traffic.Guy Benoish2018-01-181-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When feeding the master with a high rate traffic the the slave's feed is much slower. This causes the replication buffer to grow (indefinitely) which leads to slave disconnection. The problem is that writeToClient() decides to stop writing after NET_MAX_WRITES_PER_EVENT writes (In order to be fair to clients). We should ignore this when the client is a slave. It's better if clients wait longer, the alternative is that the slave has no chance to stay in sync in this situation.
* | | | | | Cluster: improve anti-affinity algo in redis-trib.rb.antirez2018-01-181-1/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See #3462 and related PRs. We use a simple algorithm to calculate the level of affinity violation, and then an optimizer that performs random swaps until things improve.
* | | | | | Remove useless comment from serverCron().antirez2018-01-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | The behavior is well specified by the code itself.
* | | | | | Merge pull request #4546 from hqin6/unstableSalvatore Sanfilippo2018-01-171-1/+3
|\ \ \ \ \ \ | | | | | | | | | | | | | | fixbug for #4545 dead loop aof rewrite
| * | | | | | fixbug for #4545 dead loop aof rewriteheqin2018-01-171-1/+1
| | | | | | |
| * | | | | | fixbug for #4545 dead loop aof rewriteheqin2017-12-181-1/+3
| | | | | | |
* | | | | | | Merge pull request #4609 from Qinch/unstableSalvatore Sanfilippo2018-01-171-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | fix assert problem in ZIP_DECODE_PREVLENSIZE macro
| * | | | | | | fix assert problem in ZIP_DECODE_PREVLENSIZEqinchao2018-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | , see issue: https://github.com/antirez/redis/issues/4587
* | | | | | | | Hopefully more clear comment to explain the change in #4607.antirez2018-01-161-3/+4
|/ / / / / / /
* | | | | | | Merge pull request #4607 from oranagra/psync2_backlogSalvatore Sanfilippo2018-01-161-0/+5
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | PSYNC2 fix - promoted slave should hold on to it's backlog
| * | | | | | | PSYNC2 fix - promoted slave should hold on to it's backlogOran Agra2018-01-161-0/+5
|/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | after a slave is promoted (assuming it has no slaves and it booted over an hour ago), it will lose it's replication backlog at the next replication cron, rather than waiting for slaves to connect to it. so on a simple master/slave faiover, if the new slave doesn't connect immediately, it may be too later and PSYNC2 will fail.