summaryrefslogtreecommitdiff
path: root/items.h
Commit message (Collapse)AuthorAgeFilesLines
* rm_lru_maintainer_initializedxuesenliang2023-03-081-1/+0
|
* core: remove *conn object from cache commandsdormando2023-01-111-4/+4
| | | | | | | | | We want to start using cache commands in contexts without a client connection, but the client object has always been passed to all functions. In most cases we only need the worker thread (LIBEVENT_THREAD *t), so this change adjusts the arguments passed in.
* core: move more storage functions to storage.cdormando2020-10-301-13/+0
| | | | | | | extstore.h is now only used from storage.c. starting a path towards getting the storage interface to be more generalized. should be no functional changes.
* meta text protocol commandsdormando2019-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | - we get asked a lot to provide a "metaget" command, for various uses (debugging, etc) - we also get asked for random one-off commands for various use cases. - I really hate both of these situations and have been wanting to experiment with a slight tuning of how get commands work for a long time. Assuming that if I offer a metaget command which gives people the information they're curious about in an inefficient format, plus data they don't need, we'll just end up with a slow command with compatibility issues. No matter how you wrap warnings around a command, people will put it into production under high load. Then I'm stuck with it forever. Behold, the meta commands! See doc/protocol.txt and the wiki for a full explanation and examples. The intent of the meta commands is to support any features the binary protocol had over the text protocol. Though this is missing some commands still, it is close and surpasses the binary protocol in many ways.
* restartable cachedormando2019-09-171-0/+2
| | | | | | | | | | | | | | | "-e /path/to/tmpfsmnt/file" SIGUSR1 for graceful stop restart requires the same memory limit, slab sizes, and some other infrequently changed details. Most other options and features can change between restarts. Binary can be upgraded between restarts. Restart does some fixup work on start for every item in cache. Can take over a minute with more than a few hundred million items in cache. Keep in mind when a cache is down it may be missing invalidations, updates, and so on.
* limit crawls for metadumperdormando2018-02-121-0/+1
| | | | | | | | | | | | | | | LRU crawler metadumper is used for getting snapshot-y looks at the LRU's. Since there's no default limit, it'll get any new items added or bumped since the roll started. with this change it limits the number of items dumped to the number that existed in that LRU when the roll was kicked off. You still end up with an approximation, but not a terrible one: - items bumped after the crawler passes them likely won't be revisited - items bumped before the crawler passes them will likely be visited toward the end, or mixed with new items. - deletes are somewhere in the middle.
* external storage base commitdormando2017-11-281-0/+13
| | | | | been squashing reorganizing, and pulling code off to go upstream ahead of merging the whole branch.
* make lru_pull_tail function publicdormando2017-09-261-0/+13
| | | | used in separate file for flash branch.
* interface code for flash branchdormando2017-09-261-1/+1
| | | | | removes a few ifdef's and upstreams small internal interface tweaks for easy rebase.
* allow pulling a tail item directlydormando2017-09-261-0/+1
| | | | plumbing for doing inline reclaim, or similar.
* add a real slab automover algorithmdormando2017-06-231-0/+8
| | | | converts the python script to C, more or less.
* LRU crawler scheduling improvementsdormando2017-05-291-0/+1
| | | | | | | | | | | | | | | | | | | when trying to manually run a crawl, the internal autocrawler is now blocked from restarting for 60 seconds. the internal autocrawl now independently schedules LRU's, and can re-schedule sub-LRU's while others are still running. should allow much better memory control when some sub-lru's (such as TEMP or WARM) are small, or slab classes are differently sized. this also makes the crawler drop its lock frequently.. this fixes an issue where a long crawl happening at the same time as a hash table expansion could hang the server until the crawl finished. to improve still: - elapsed time can be wrong in the logger entry - need to cap number of entries scanned. enough set pressure and a crawl may never finish.
* refactor chunk chaining for memory efficiency1.4.36dormando2017-03-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory chunk chains would simply stitch multiple chunks of the highest slab class together. If your item was 17k and the chunk limit is 16k, the item would use 32k of space instead of a bit over 17k. This refactor simplifies the slab allocation path and pulls the allocation of chunks into the upload process. A "large" item gets a small chunk assigned as an object header, rather than attempting to inline a slab chunk into a parent chunk. It then gets chunks individually allocated and added into the chain while the object uploads. This solves a lot of issues: 1) When assembling new, potentially very large items, we don't have to sit and spin evicting objects all at once. If there are 20 16k chunks in the tail and we allocate a 1 meg item, the new item will evict one of those chunks inbetween each read, rather than trying to guess how many loops to run before giving up. Very large objects take time to read from the socket anyway. 2) Simplifies code around the initial chunk. Originally embedding data into the top chunk and embedding data at the same time required a good amount of fiddling. (Though this might flip back to embedding the initial chunk if I can clean it up a bit more). 3) Pulling chunks individually means the slabber code can be flatened to not think about chunks aside from freeing them, which culled a lot of code and removed branches from a hot path. 4) The size of the final chunk is naturally set to the remaining about of bytes that need to be stored, which means chunks from another slab class can be pulled to "cap off" a large item, reducing memory overhead.
* use LRU thread for COLD -> WARM bumpsdormando2017-01-301-0/+4
| | | | | | | | | | | | | | Previous tree fixed a problem; active items needed to be processed from the tail of COLD, which makes evictions harder without evicting active items. COLD bumps were modified to be immediate (old style). This uses a per-worker-thread mostly-nonblocking queue that the LRU thread consumes for COLD bumps. In most cases, hits to COLD are 1/10th or less than the other classes. On high rates of access where the buffers fill, those items simply don't get their ACTIVE bit set. If they get hit again with free space, they will be processed then. This prevents regressions from high speed keyspace scans.
* NOEXP_LRU is now TEMP_LRUdormando2017-01-221-1/+1
| | | | | | | Confident the other feature was never used; and if someone wants it, it's easy to restore by allowing exptime of 0 to go into TEMP_LRU. This could possibly become a default, or at least recommended.
* Do LRU-bumps while already holding item lockdormando2017-01-221-1/+1
| | | | | | | | item_get() would hash, item_lock, fetch item. consumers which can bump the LRU would then call item_update(), which would hash, item_lock, then update the item. Good performance bump by inlining the LRU bump when it's necessary.
* pull LRU crawler out into its own file.dormando2016-08-191-11/+13
| | | | | | | ~600 lines gone from items.c makes it a lot more manageable. this change is almost purely moving code around and renaming functions. very little logic has changed.
* refactor checkpoint for LRU crawlerdormando2016-08-191-3/+2
| | | | | | | | | | | | | | now has internal module system for the LRU crawler. autocrawl checker should be a bit better now. doesn't constantly re-run the histogram calcs. metadump works as a module now. ended up generalizing the client case outside of the module system since it looks reusable. Cut the amount of functions required for metadump specifically to nothing. still need to bug hunt, a few more smaller refactors, and see about pulling this out into its own file.
* prototype functionality for LRU metadumperdormando2016-08-191-1/+2
| | | | | Functionality is nearly all there. A handful of FIXME's and TODO's to address. From there it needs to be refactored into something proper.
* fix zero hash items evictionEiichi Tsukata2016-07-131-1/+1
| | | | | | | If all hash values of five tail items are zero on the specified slab class, expire check is unintentionally skipped and items stay without being evicted. Consequently, new item allocation consume memory space every time an item is set, that leads to slab OOM errors.
* finish stats_sizes rewritedormando2016-06-241-0/+4
| | | | | | | | | | Now relies on CAS feature for runtime enable/disable tracking. Still usable if enabled at starttime with CAS disabled. Also adds start option `-o track_sizes`, and a stat for `stats settings`. Finally, adds documentation and cleans up status outputs. Could use some automated tests but not make or break for release.
* online hang-free "stats sizes" command.dormando2016-06-241-0/+2
| | | | | | | | | | | | | | | "stats sizes" is one of the lack cache-hanging commands. With millions of items it can hang for many seconds. This commit changes the command to be dynamic. A histogram is tracked as items are linked and unlinked from the cache. The tracking is enabled or disabled at runtime via "stats sizes_enable" and "stats sizes_disable". This presently "works" but isn't accurate. Giving it some time to think over before switching to requiring that CAS be enabled. Otherwise the values could underflow if items are removed that existed before the sizes tracker is enabled. This attempts to work around it by using it->time, which gets updated on fetch, and is thus inaccurate.
* treat and print item flags as unsigned intdormando2016-06-231-1/+1
| | | | | | | most of the code would parse and handle flags as unsigned int, but passed into alloc functions as a signed int... which would then continue to print it as unsigned up until a change made in 2007. Now treat it fully as unsigned and print as unsigned.
* Implement get_expired statssergiocarlos2016-05-281-2/+2
|
* first half of new slab automoverdormando2015-11-181-1/+0
| | | | | | | | | | | | | | | | | | If any slab classes have more than two pages worth of free chunks, attempt to free one page back to a global pool. Create new concept of a slab page move destination of "0", which is a global page pool. Pages can be re-assigned out of that pool during allocation. Combined with item rescuing from the previous patch, we can safely shuffle pages back to the reassignment pool as chunks free up naturally. This should be a safe default going forward. Users should be able to decide to free or move pages based on eviction pressure as well. This is coming up in another commit. This also fixes a calculation of the NOEXP LRU size, and completely removes the old slab automover thread. Slab automove decisions will now be part of the lru maintainer thread.
* slab mover rescues valid items with free chunksdormando2015-11-181-0/+2
| | | | | | | | | | | During a slab page move items are typically ejected regardless of their validity. Now, if an item is valid and free chunks are available in the same slab class, copy the item over and replace it. It's up to external systems to try to ensure free chunks are available before moving a slab page. If there is no memory it will simply evict them as normal. Also adds counters so we can finally tell how often these cases happen.
* ding-dong the cache_lock is dead.dormando2015-01-091-1/+0
|
* LRU maintainer thread now fires LRU crawlerdormando2015-01-061-1/+1
| | | | | | | | ... if available. Very simple starter heuristic for how often to run the crawler. At this point, this patch series should have a significant impact on hit ratio.
* direct reclaim mode for evictionsdormando2015-01-041-2/+0
| | | | | | | | | | | | Only way to do eviction case fast enough is to inline it, sadly. This finally deletes the old item_alloc code now that I'm not intending on reusing it. Also removes the condition wakeup for the background thread. Instead runs on a timer, and meters its aggressiveness by how much shuffling is going on. Also fixes a segfault in lru_pull_tail(), was unlinking `it` instead of `search`.
* first pass at LRU maintainer threaddormando2015-01-031-0/+7
| | | | | | | | | | | | | | | | | The basics work, but tests still do not pass. A background thread wakes up once per second, or when signaled. It is signaled if a slab class gets an allocation request and has fewer than N chunks free. The background thread shuffles LRU's: HOT, WARM, COLD. HOT is where new items exist. HOT and WARM flow into COLD. Active items in COLD flow back to WARM. Evictions are pulled from COLD. item_update's no longer do anything (and need to be fixed to tick it->time). Items are reshuffled within or around LRU's as they reach the bottom. Ratios of HOT/WARM memory are hardcoded, as are the low/high watermarks. Thread is not fast enough right now, sets cannot block on it.
* Beginning work for LRU reworkdormando2015-01-021-4/+6
| | | | | | | | Primarily splitting cache_lock into a lock-per LRU, and making the it->slab_clsid lookup indirect. cache_lock is now more or less gone. Stats are still wrong. they need to internally summarize over each sub-class.
* flush_all was not thread safe.dormando2015-01-011-1/+0
| | | | | | | | | | | | | | | Unfortunately if you disable CAS, all items set in the same second as a flush_all will immediately expire. This is the old (2006ish) behavior. However, if CAS is enabled (as is the default), it will still be more or less exact. The locking issue is that if the LRU lock is held, you may not be able to modify an item if the item lock is also held. This means that some items may not be flushed if locking is done correctly. In the current code, it could lead to corruption as an item could be locked and in use while the expunging is happening.
* Pause all threads while swapping hash table.dormando2014-12-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | We used to hold a global lock around all modifications to the hash table. Then it was switched to wrapping hash table accesses in a global lock during hash table expansion, set by notifying each worker thread to change lock styles. There was a bug here which causes trylocks to clobber, due to the specific item locks not being held during the global lock: https://code.google.com/p/memcached/issues/detail?id=370 The patch previous to this one uses item locks during hash table expansion. Since the item lock table is always smaller than the hash table, an item lock will always cover both its new and old buckets. However, we still need to pause all threads during the pointer swap and setup. This patch pauses all background threads and worker threads, swaps the hash table, then unpauses them. This trades the (possibly significant) slowdown during the hash table copy, with a short total hang at the beginning of each expansion. As previously; those worried about consistent performance can presize the hash table with `-o hashpower=n`
* Avoid OOM errors when locked items stuck in taildormando2014-10-121-0/+1
| | | | | | | | | | | | If a client fetches a few thousand keys, then does not ever read the socket, those keys will stay reflocked until the client disconnects or resumes. If some of those items are unpopular they can drop to the tail, causing all writes in the slab class to OOM. This creates some relief by chucking the items back to the head. Big thanks to Jay Grizzard and other folks at Box for helping narrow this down.
* optionally take a list of slabs to run against.dormando2014-04-171-1/+1
| | | | | lru_crawler crawl 1,2,3,10,20 will kick crawlers off for all of those slabs in parallel.
* control systemdormando2014-04-171-0/+6
| | | | | | | | | | | nothing internally magically fires it off yet, but now there is an external command: lru_crawler crawl [classid] ... will signal the thread to wake up and immediately reap through a particular class. need some thought/feedback for internal kickoffs (plugins?)
* barebones LRU crawler proof of conceptdormando2014-04-171-0/+3
| | | | | | | | | so many things undone... TODO is inline in items.c. this seems to work, and the locking should be correct. it is a background thread so shouldn't cause significant latency. However it does quickly roll through the entire LRU (and as of this PoC it just constantly runs), so there will be cpu impact.
* remove global stats lock from item allocationdormando2012-09-031-0/+1
| | | | | | | | | | This doesn't reduce mutex contention much, if at all, for the global stats lock, but it does remove a handful of instructions from the alloc hot path, which is always worth doing. Previous commits possibly added a handful of instructions for the loop and for the bucket readlock trylock, but this is still faster than .14 for writes overall.
* alloc loop now attempts an item_lockdormando2012-09-031-1/+1
| | | | | | | | | | | | | | | | Fixes a few issues with a restructuring... I think -M was broken before, should be fixed now. It had a refcount leak. Now walks up to five items from the bottom in case of the bottomost items being item_locked, or refcount locked. Helps avoid excessive OOM errors for some oddball cases. Those happen more often if you're hammering on a handful of pages in a very large class size (100k+) The hash item lock ensures that if we're holding that lock, no other thread can be incrementing the refcount lock at that time. It will mean more in future patches. slab rebalancer gets a similar update.
* initial slab automoverdormando2012-01-031-0/+1
| | | | | | | | | | Enable at startup with -o slab_reassign,slab_automove Enable or disable at runtime with "slabs automove 1\r\n" Has many weaknesses. Only pulls from slabs which have had zero recent evictions. Is slow, not tunable, etc. Use the scripts/mc_slab_mover example to write your own external automover if this doesn't satisfy.
* use item partitioned lock for as much as possibledormando2011-11-091-1/+1
| | | | push cache_lock deeper into the abyss
* move hash calls outside of cache_lockdormando2011-11-091-6/+6
| | | | | been hard to measure while using the intel hash (since it's very fast), but should help with the software hash.
* Backport binary TOUCH/GAT/GATQ commandsdormando2011-09-271-0/+1
| | | | | Taken from the 1.6 branch, partly written by Trond. I hope the CAS handling is correct.
* Kill off redundant item_init.Dustin Sallings2009-09-111-1/+0
| | | | | | These are automatically initialized to 0 (both Trond and the spec says so, and I asserted it on all current builders at least once before killing it off).
* Don't expose the protocol used to the client api of the statsTrond Norbye2009-04-021-7/+2
| | | | | (dustin) I made some changes to the original growth code to pass in the required size.
* "stats reset" should reset eviction counters as wellTrond Norbye2009-03-241-0/+2
| | | | See: http://code.google.com/p/memcached/issues/detail?id=22
* Update CAS on non-replace incr/decr.Dustin Sallings2009-02-111-0/+2
| | | | | | | | This fixes a problem reported as bug 15 where incr and decr do not change CAS values when they aren't completely replacing the item (which is the typical case). http://code.google.com/p/memcached/issues/detail?id=15
* Fix for stats opaque issue pointed out at the hackathon and removed some ↵Toru Maesaka2009-01-031-4/+4
| | | | wasteful function calls (more to come).
* The slabber no longer needs a is_binary like flag for stats due to ↵Toru Maesaka2009-01-031-6/+6
| | | | abstraction by the callback.
* Changed the argument ordering for stats callback to something more common.Toru Maesaka2009-01-031-2/+2
|