summaryrefslogtreecommitdiff
path: root/items.c
Commit message (Collapse)AuthorAgeFilesLines
* rm_lru_maintainer_initializedxuesenliang2023-03-081-6/+0
|
* core: remove *conn object from cache commandsdormando2023-01-111-18/+18
| | | | | | | | | We want to start using cache commands in contexts without a client connection, but the client object has always been passed to all functions. In most cases we only need the worker thread (LIBEVENT_THREAD *t), so this change adjusts the arguments passed in.
* core: give threads unique namesdormando2022-11-011-0/+1
| | | | | allow users to differentiate thread functions externally to memcached. Useful for setting priorities or pinning threads to CPU's.
* core: make large item storage more reliable1.6.17dormando2022-08-261-2/+17
| | | | | | | | When allocating sub-max chunks for the tail end of a large item the allocator would only look at the exact slab class. If items in a cache are all exclusively large, these slab classes could be empty. Now as a fallback it will also check and evict from the largest slab class even if it doeesn't necessarily want the largest chunk.
* re-allow aggressive slab mover for global pagesdormando2022-08-251-1/+7
| | | | | fixes failing tests and scenarios where a lot of memory is freed up at once.
* extstore: make defaults more aggressivedormando2022-08-251-7/+1
| | | | | | | | | | | | | | | | | | | | | | extstore has a background thread which examines slab classes for items to flush to disk. The thresholds for flushing to disk are managed by a specialized "slab automove" algorithm. This algorithm was written in 2017 and not tuned since. Most serious users set "ext_item_age=0" and force flush all items. This is partially because the defaults do not flush aggressively enough, which causes memory to run out and evictions to happen. This change simplifies the slab automove portion. Instead of balancing free chunks of memory per slab class, it sets a target of a certain number of free global pages. The extstore flusher thread also uses the page pool and some low chunk limits to decide when to start flushing. Its sleep routines have also been adjusted as it could oversleep too easily. A few other small changes were required to avoid over-moving slab pages around.
* Improve Slab Automove behaviorIliya2022-08-221-3/+10
| | | | | | | | | | | - Skip using crawler items when calculating the automover age stats as they can severly skew the ages in the stats to the point of completely starving particular slabs - Include the current window data in the window sum so we don't free pages that are actually needed - this also matches the python script behavior - Reset young / old when interrupting the automove decision loop so we don't accidentally move things which we didn't mean to
* Report item sizes for fetch, mutation, and eviction watchersKevin Lin2021-08-061-2/+2
| | | | | | This adds a new field `size` to logger entry lines for item_get, item_store, and eviction events indicating the size of the associated item in bytes.
* meta: protect cachedump from bin keys and add docsdormando2021-06-071-1/+2
| | | | | cachedump was the only place in the codebase I can find which copied the key verbatim. wonder when I can finally remove the command :)
* core: move more storage functions to storage.cdormando2020-10-301-1/+1
| | | | | | | extstore.h is now only used from storage.c. starting a path towards getting the storage interface to be more generalized. should be no functional changes.
* Remove multiple double-initializations of condition variables and mutexesDaniel Schemmel2019-11-101-4/+1
| | | | | | | | | - `slabs_rebalance_lock` - `slab_rebalance_cond` - `maintenance_lock` - `lru_crawler_lock` - `lru_crawler_cond` - `lru_maintainer_lock`
* Keep "last access" time up to date in SLRU modedormando2019-09-301-18/+4
| | | | | | | | | | | | The last access time used to only update once per minute to avoid excess bumping on hot items. However, with segmented mode if an item is hit a lot it's simply poked in place. Previous to this change we were calling extra functions and branches for no real reason. Also when bumping within the WARM_LRU, we were updating the last access time despite it being a shuffle. Also it was skipping the bump if the access time was too recent, which is one hell of a bug.
* meta text protocol commandsdormando2019-09-301-24/+31
| | | | | | | | | | | | | | | | | | | | | | | | | - we get asked a lot to provide a "metaget" command, for various uses (debugging, etc) - we also get asked for random one-off commands for various use cases. - I really hate both of these situations and have been wanting to experiment with a slight tuning of how get commands work for a long time. Assuming that if I offer a metaget command which gives people the information they're curious about in an inefficient format, plus data they don't need, we'll just end up with a slow command with compatibility issues. No matter how you wrap warnings around a command, people will put it into production under high load. Then I'm stuck with it forever. Behold, the meta commands! See doc/protocol.txt and the wiki for a full explanation and examples. The intent of the meta commands is to support any features the binary protocol had over the text protocol. Though this is missing some commands still, it is close and surpasses the binary protocol in many ways.
* match comment to codedormando2019-09-211-1/+0
|
* restartable cachedormando2019-09-171-2/+33
| | | | | | | | | | | | | | | "-e /path/to/tmpfsmnt/file" SIGUSR1 for graceful stop restart requires the same memory limit, slab sizes, and some other infrequently changed details. Most other options and features can change between restarts. Binary can be upgraded between restarts. Restart does some fixup work on start for every item in cache. Can take over a minute with more than a few hundred million items in cache. Keep in mind when a cache is down it may be missing invalidations, updates, and so on.
* add unlock when item_cachedump malloc failedminkikim892019-08-271-0/+1
|
* log client connection id with fetchers and mutationsTharanga Gamaethige2019-08-271-1/+1
| | | | you can now monitor fetch and mutations of a given client
* move mem_requested from slabs.c to items.cdormando2019-07-261-20/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_requested is an oddball counter: it's the total number of bytes "actually requested" from the slab's caller. It's mainly used for a stats counter, alerting the user that the slab factor may not be efficient if the gap between total_chunks * chunk_size - mem_requested is large. However, since chunked items were added it's _also_ used to help the LRU balance itself. The total number of bytes used in the class vs the total number of bytes in a sub-LRU is used to judge whether to move items between sub-LRU's. This is a layer violation; forcing slabs.c to know more about how items work, as well as EXTSTORE for calculating item sizes from headers. Further, it turns out it wasn't necessary for item allocation: if we need to evict an item we _always_ pull from COLD_LRU or force a move from HOT_LRU. So the total doesn't matter. The total does matter in the LRU maintainer background thread. However, this thread caches mem_requested to avoid hitting the slab lock too frequently. Since sizes_bytes[] within items.c is generally redundant with mem_requested, we now total sizes_bytes[] from each sub-LRU before starting a batch of LRU juggles. This simplifies the code a bit, reduces the layer violations in slabs.c slightly, and actually speeds up some hot paths as a number of branches and operations are removed completely. This also fixes an issue I was having with the restartable memory branch :) recalculating p->requested and keeping a clean API is painful and slow. NOTE: This will vary a bit compared to what mem_requested originally did, mostly for large chunked items. For items which fit inside a single slab chunk, the stat is identical. However, items constructed by chaining chunks will have a single large "nbytes" value and end up in the highest slab class. Chunked items can be capped with chunks from smaller slab classes; you will see utilization of chunks but not an increase in mem_requested for them. I'm still thinking this through but this is probably acceptable. Large chunked items should be accounted for separately, perhaps with some new counters so they can be discounted from normal calculations.
* When nsuffix is 0 space for flags hasn't been allocated so don't memcpy them.1.5.16Matthew Shafer2019-05-241-1/+3
|
* widen internal item flags to 16bits.dormando2019-05-201-1/+1
| | | | | | | | | did a weird dance. nsuffix is no longer an 8bit length, replaced with ITEM_CFLAGS bit. This indicates whether there is a 32bit set of client flags in the item or not. possible after removing the inlined ascii response header via previous commit.
* remove inline_ascii_response optiondormando2019-05-201-13/+4
| | | | | | Has defaulted to false since 1.5.0, and with -o modern for a few years before that. Performance is fine, no reported bugs. Always was the intention. Code is simpler without the options.
* expand NEED_ALIGN for chunked itemsdormando2018-08-081-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | some whackarse ARM platforms on specific glibc/gcc (new?) versions trip SIGBUS while reading the header chunk for a split item. the header chunk is unfortunate magic: It lives in ITEM_data() at a random offset, is zero sized, and only exists to simplify code around finding the orignial slab class, and linking/relinking subchunks to an item. there's no fix to this which isn't a lot of code. I need to refactor chunked items, and attempted to do so, but couldn't come up with something I liked quickly enough. This change pads the first chunk if alignment is necessary, which wastes bytes and a little CPU, but I'm not going to worry a ton for these obscure platforms. this works with rebalancing because in the case of ITEM_CHUNKED header, it treats the item size as the size of the class it resides in, and memcpy's the item during recovery. all other cases were changes from ITEM_data to a new ITEM_schunk() inline function that is created when NEED_ALIGN is set, else it's equal to ITEM_data still.
* split storage writer into its own threaddormando2018-08-031-15/+0
| | | | | | | | | | | trying out a simplified slab class backoff algorithm. The LRU maintainer individually schedules slab classes by time, which leads to multiple wakeups in a steady state as they get out of sync. This algorithm more simply skips that class more often each time it runs the main loop, using a single scheduled sleep instead. if it goes to sleep for a long time, it also reduces the backoff for all classes. if we're barely awake it should be fine to poke everything.
* fix gcc warningsMiroslav Lichvar2018-02-191-1/+1
|
* limit crawls for metadumperdormando2018-02-121-0/+5
| | | | | | | | | | | | | | | LRU crawler metadumper is used for getting snapshot-y looks at the LRU's. Since there's no default limit, it'll get any new items added or bumped since the roll started. with this change it limits the number of items dumped to the number that existed in that LRU when the roll was kicked off. You still end up with an approximation, but not a terrible one: - items bumped after the crawler passes them likely won't be revisited - items bumped before the crawler passes them will likely be visited toward the end, or mixed with new items. - deletes are somewhere in the middle.
* extstore: close hole with storage trackingdormando2017-12-131-0/+3
| | | | | items expired/evicted while pulling from tail weren't being tracked, leading to a leak of object counts in pages.
* extstore: fix size tracking and adjust drop_unreaddormando2017-12-081-0/+18
| | | | | | | | | | | | | | was early evicting from HOT/WARM LRU's for item headers because the *original* item size was being tracked, then compared to the actual byte totals for the class. also adjusts drop_unread so it drops items which are currently in the COLD_LRU this is expected to be used with very low compacat_under values; ie 2-5 depending on page count and write load. If you can't defrag-compact, drop-compact. but this is still subtly wrong, since drop_compact is now an option.
* extstore: C version of automove algorithmdormando2017-12-071-7/+21
| | | | | | | | couple TODO items left for a new issue I thought of. Also hardcoded memory buffer size which should be fixed. also need to change the "free and re-init" logic to use a boolean in case any related option changes.
* extstore: configure and start time gatingdormando2017-11-281-7/+9
| | | | | ./configure --enable-extstore to compile the feature in specify -o ext_path=/whatever to start.
* external storage base commitdormando2017-11-281-0/+21
| | | | | been squashing reorganizing, and pulling code off to go upstream ahead of merging the whole branch.
* fix use of unitialized array in lru_maintainerdormando2017-10-291-3/+2
| | | | | | | | | use slightly more modern syntax :P Wasn't really a bug since it'd just sleep too much for a bit or cap it back to the MAX. This will give more consistent behavior though. Thanks to shqking on github for the report
* make lru_pull_tail function publicdormando2017-09-261-14/+1
| | | | used in separate file for flash branch.
* interface code for flash branchdormando2017-09-261-2/+2
| | | | | removes a few ifdef's and upstreams small internal interface tweaks for easy rebase.
* allow pulling a tail item directlydormando2017-09-261-12/+26
| | | | plumbing for doing inline reclaim, or similar.
* stats cachedump: always dump COLD LRUdormando2017-08-241-3/+1
| | | | | was defaulting to HOT, but HOT can be empty pretty easily and this can be confusing.
* fix for musl libc: avoid huge stack allocationdormando2017-07-161-5/+11
| | | | | | too used to thread stacks being several megabytes, maybe :) The crawlerstats gets a bit big so do a normal memory allocation for it. seems to work, but I can't run tests on musl without making the debug binary build.
* sanity check1.4.39dormando2017-07-041-0/+2
|
* save four bytes per item if client flags are 0dormando2017-07-031-2/+6
| | | | | If the size of the flags are 0, it can easily mean to not store anything at all.
* hot_max_age is now hot_max_factordormando2017-06-241-2/+4
| | | | | | | | | defaults at 20% of COLD age. hot_max_age was added because many people's caches were sitting at 32% memory utilized (exactly the size of hot). Capping the LRU's by percentage and age would promote some fairness, but I made a mistake making WARM dynamic but HOT static. This is now fixed.
* add a real slab automover algorithmdormando2017-06-231-11/+59
| | | | converts the python script to C, more or less.
* fix LRU maintainer thread slowdown in edge casedormando2017-06-221-1/+1
| | | | | if doing a lot of bumps within WARM LRU, the thread can start to sleep more often because it thinks it's not completing any work.
* lru_crawler avoid-infinite-runs1.4.37dormando2017-06-041-1/+10
| | | | | | | | | | | under enough set pressure some slab classes may never complete scanning, as there's always something new at the top. this is a quick workaround for the internal scanner. always use a limit seeded at the size of the largest class. smaller classes will simply finish sooner. needs a better fix for the user-based commands. change of the API would allow for per-crawler tocrawl values.
* per-LRU hits breakdowndormando2017-06-041-0/+32
| | | | | | | | no actual speed loss. emulate the slab_stats "get_hits" by totalling up the per-LRU get_hits. could sub-LRU many stats but should use a different command/interface for that.
* LRU crawler scheduling improvementsdormando2017-05-291-10/+26
| | | | | | | | | | | | | | | | | | | when trying to manually run a crawl, the internal autocrawler is now blocked from restarting for 60 seconds. the internal autocrawl now independently schedules LRU's, and can re-schedule sub-LRU's while others are still running. should allow much better memory control when some sub-lru's (such as TEMP or WARM) are small, or slab classes are differently sized. this also makes the crawler drop its lock frequently.. this fixes an issue where a long crawl happening at the same time as a hash table expansion could hang the server until the crawl finished. to improve still: - elapsed time can be wrong in the logger entry - need to cap number of entries scanned. enough set pressure and a crawl may never finish.
* fix lru thread sleeper moredormando2017-05-281-8/+13
| | | | | | | | | | | | | | | added a bug which caused LRU juggler to never sleep. increased max sleep time to 1s. also fixed a bug where every other LRU round had a 0 sleep. there's still wakeup overkill once one slab class becomes active, it will "desync" from the amount of sleep required for other slab classes. They will pull the LRU once per second, but the thread wakes up, up to the number of active slab classes per second. deprioritizing, but a clean way of re-syncing the slab classes would minimize wakeups.
* Sleep more aggressively in some threadsGrant Mathews2017-05-221-2/+4
| | | | | | | | | | | The logger and lru maintainer threads both adjust how long they sleep based on how busy they are. This adjustment should be exponential, to more quickly adjust to workloads. The logger thread in particular would only adjust 50 microseconds at a time, and was capped at 100 milliseconds of sleep, causing many unnecessary wakeups on an otherwise idle dev machine. Adjust this cap to 1 second.
* refactor chunk chaining for memory efficiency1.4.36dormando2017-03-191-38/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory chunk chains would simply stitch multiple chunks of the highest slab class together. If your item was 17k and the chunk limit is 16k, the item would use 32k of space instead of a bit over 17k. This refactor simplifies the slab allocation path and pulls the allocation of chunks into the upload process. A "large" item gets a small chunk assigned as an object header, rather than attempting to inline a slab chunk into a parent chunk. It then gets chunks individually allocated and added into the chain while the object uploads. This solves a lot of issues: 1) When assembling new, potentially very large items, we don't have to sit and spin evicting objects all at once. If there are 20 16k chunks in the tail and we allocate a 1 meg item, the new item will evict one of those chunks inbetween each read, rather than trying to guess how many loops to run before giving up. Very large objects take time to read from the socket anyway. 2) Simplifies code around the initial chunk. Originally embedding data into the top chunk and embedding data at the same time required a good amount of fiddling. (Though this might flip back to embedding the initial chunk if I can clean it up a bit more). 3) Pulling chunks individually means the slabber code can be flatened to not think about chunks aside from freeing them, which culled a lot of code and removed branches from a hot path. 4) The size of the final chunk is naturally set to the remaining about of bytes that need to be stored, which means chunks from another slab class can be pulled to "cap off" a large item, reducing memory overhead.
* fix refcount leak in LRU bump bufdormando2017-03-191-0/+3
| | | | takes a long time for mc-crusher to find this. I didn't run it long enough :(
* allow TEMP_LRU to work with flat LRU.dormando2017-02-051-7/+5
|
* Allow switching LRU algo's at runtimedormando2017-01-301-14/+21
| | | | | | | | If LRU maintainer thread is started, this allows you to switch between "flat" and "segmented" modes at runtime. The maintainer thread will drain HOT/WARM LRU's if put into flat mode, and no new items should fill in. This was much easier than expected...