summaryrefslogtreecommitdiff
path: root/slabs.c
Commit message (Collapse)AuthorAgeFilesLines
* core: give threads unique namesdormando2022-11-011-0/+1
| | | | | allow users to differentiate thread functions externally to memcached. Useful for setting priorities or pinning threads to CPU's.
* Fix function protypes for clang errorsKhem Raj2022-10-131-1/+1
| | | | | | | | | | | clang-15+ has started diagnosing them as errors thread.c:925:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes] | void STATS_UNLOCK() { | ^ | void Signed-off-by: Khem Raj <raj.khem@gmail.com>
* core: move more storage functions to storage.cdormando2020-10-301-0/+1
| | | | | | | extstore.h is now only used from storage.c. starting a path towards getting the storage interface to be more generalized. should be no functional changes.
* restart: fix failure on deleted chunked itemsdormando2020-04-121-0/+2
| | | | the classid is expected to be set correctly on freed memory.
* Fix t/64bit.t test failure in Windows.Jefty Negapatan2020-04-101-1/+4
| | | | | | | | | | | t/64bit.t .. 1/6 Failed test 'expected (faked) value of total_malloced' at t/64bit.t line 30. got: '32' expected: '4294967328' Failed test 'hit size limit' at t/64bit.t line 41. In Windows, long is just 32-bit even for 64-bit platforms.
* restart: fix corrupted restart in some scenariosdormando2020-03-261-0/+4
| | | | | | | | | | | If the mmap file is reused but the memory isn't supposed to be reused, pages are thrown into the global page pool. Normally when pages are released into the pool the header of the page is zero'ed so the restart_check() code will know to place it back into the global pool. When restarting multiple times the slabs_prefill() part of the startup code was missing this zero'ing step, so the _next_ time restart happens properly restart_check() could attempt to recover that memory.
* restart: always wipe memory from global pooldormando2020-03-261-2/+4
| | | | | | An ifdef was added for FreeBSD at some point since it gives you 0'd out memory from malloc, but memory from the global pool isn't always from malloc.
* slabs: fix for skipping completed items1.5.22dormando2020-02-011-24/+15
| | | | | | | | | | | | This change removes the unused environment variable for running N chunk move checks per slab lock and fixes the "completed" check. My idea for the "completed" bits were originally to skip acquiring the slabs lock for items we know to be complete. The check was accidentally made redundant with the post-lock flags check, so there was no actual speedup during busy loops. This should now have much less performance impact for page moving.
* slabs: fix crash in page moverdormando2020-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PR #524 speeds up the slab page mover for some situations: if an item is expired, but not yet reaped (by a fetch, or LRU crawler), the slab page mover unlinks the item, marks it as free, and doesn't mark the loop as "Busy". If any items were marked as 'busy' (needing to be freed, re-checked, or actively being used) the page mover will re-scan all entries from the top. With this change the re-scan becomes less likely. The bug is in the check for chunked items. `ch` is only set if an ITEM_CHUNK is detected. If ITEM_CHUNKED is instead detected we also need to handle it with the full free routine. This was not done. This frees the header item (CHUNKED), and then leaks all of the chunks associated with it. Getting into this situation is hard: - have an expired chunked item in memory that the LRU crawler hasn't caught. - have the page mover find and move the page specifically with the header item, not an individual chunk (these are in the small slab classes). - later on pages with the leaked chunks get moved to other slab classes. The slab mover thinks they're either free or that it needs to unlink the orphaned chunks. This should then crash in the page mover since the next/prev headers may no longer make sense. Or, worse, this has to go on for a while since the may make sense. This one liner fixes it.
* Refactoring proposal of allocating large chunk of slabs.David Carlier2020-01-131-15/+17
| | | | Using native capabilities on FreeBSD, super pages.
* Remove multiple double-initializations of condition variables and mutexesDaniel Schemmel2019-11-101-6/+0
| | | | | | | | | - `slabs_rebalance_lock` - `slab_rebalance_cond` - `maintenance_lock` - `lru_crawler_lock` - `lru_crawler_cond` - `lru_maintainer_lock`
* slab rebalance performance improvementsDaniel Byrne2019-10-171-9/+35
| | | | | | | | For full discussion see: https://github.com/memcached/memcached/pull/524 - Avoids looping in most cases where an item had to be force-freed - Avoids re-locking and re-checking already completed memory - Uses a backoff timer for sleeping when busy items are found
* restartable cachedormando2019-09-171-12/+78
| | | | | | | | | | | | | | | "-e /path/to/tmpfsmnt/file" SIGUSR1 for graceful stop restart requires the same memory limit, slab sizes, and some other infrequently changed details. Most other options and features can change between restarts. Binary can be upgraded between restarts. Restart does some fixup work on start for every item in cache. Can take over a minute with more than a few hundred million items in cache. Keep in mind when a cache is down it may be missing invalidations, updates, and so on.
* move mem_requested from slabs.c to items.cdormando2019-07-261-100/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_requested is an oddball counter: it's the total number of bytes "actually requested" from the slab's caller. It's mainly used for a stats counter, alerting the user that the slab factor may not be efficient if the gap between total_chunks * chunk_size - mem_requested is large. However, since chunked items were added it's _also_ used to help the LRU balance itself. The total number of bytes used in the class vs the total number of bytes in a sub-LRU is used to judge whether to move items between sub-LRU's. This is a layer violation; forcing slabs.c to know more about how items work, as well as EXTSTORE for calculating item sizes from headers. Further, it turns out it wasn't necessary for item allocation: if we need to evict an item we _always_ pull from COLD_LRU or force a move from HOT_LRU. So the total doesn't matter. The total does matter in the LRU maintainer background thread. However, this thread caches mem_requested to avoid hitting the slab lock too frequently. Since sizes_bytes[] within items.c is generally redundant with mem_requested, we now total sizes_bytes[] from each sub-LRU before starting a batch of LRU juggles. This simplifies the code a bit, reduces the layer violations in slabs.c slightly, and actually speeds up some hot paths as a number of branches and operations are removed completely. This also fixes an issue I was having with the restartable memory branch :) recalculating p->requested and keeping a clean API is painful and slow. NOTE: This will vary a bit compared to what mem_requested originally did, mostly for large chunked items. For items which fit inside a single slab chunk, the stat is identical. However, items constructed by chaining chunks will have a single large "nbytes" value and end up in the highest slab class. Chunked items can be capped with chunks from smaller slab classes; you will see utilization of chunks but not an increase in mem_requested for them. I'm still thinking this through but this is probably acceptable. Large chunked items should be accounted for separately, perhaps with some new counters so they can be discounted from normal calculations.
* widen internal item flags to 16bits.dormando2019-05-201-2/+2
| | | | | | | | | did a weird dance. nsuffix is no longer an 8bit length, replaced with ITEM_CFLAGS bit. This indicates whether there is a 32bit set of client flags in the item or not. possible after removing the inlined ascii response header via previous commit.
* expand NEED_ALIGN for chunked itemsdormando2018-08-081-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | some whackarse ARM platforms on specific glibc/gcc (new?) versions trip SIGBUS while reading the header chunk for a split item. the header chunk is unfortunate magic: It lives in ITEM_data() at a random offset, is zero sized, and only exists to simplify code around finding the orignial slab class, and linking/relinking subchunks to an item. there's no fix to this which isn't a lot of code. I need to refactor chunked items, and attempted to do so, but couldn't come up with something I liked quickly enough. This change pads the first chunk if alignment is necessary, which wastes bytes and a little CPU, but I'm not going to worry a ton for these obscure platforms. this works with rebalancing because in the case of ITEM_CHUNKED header, it treats the item size as the size of the class it resides in, and memcpy's the item during recovery. all other cases were changes from ITEM_data to a new ITEM_schunk() inline function that is created when NEED_ALIGN is set, else it's equal to ITEM_data still.
* support transparent hugepages on LinuxChen-Yu Tsai2018-07-051-1/+66
| | | | | | | | | | | | | | | | | | | Linux has supported transparent huge pages on Linux for quite some time. Memory regions can be marked for conversion to huge pages with madvise. Alternatively, Users can have the system default to using huge pages for all memory regions when applicable, i.e. when the mapped region is large enough, the properly aligned pages will be converted. Using either method, we would preallocate memory for the cache with proper alignment, and call madvise on it. Whether the memory region actually gets converted to hugepages ultimately depends on the setting of /sys/kernel/mm/transparent_hugepage/enabled. The existence of this file is also checked to see if transparent huge pages support is compiled into the kernel. If any step of the preallocation fails, we simply fallback to standard allocation, without even preallocating slabs, as they would not have the proper alignment or settings anyway.
* extstore: revise automove algorithmdormando2018-02-081-3/+6
| | | | | | | allows reassigning memory from global page pool to a specific class. this allows simplifying the algorithm to rely on moving memory to/from global, removing hacks around relaxing free memory requirements.
* quick fix for slab mover deadlockdormando2018-01-231-0/+2
| | | | | | | | | | | | this fix may be replaced with a better restructuring; as this was done more properly in the result handler code below. Not releasing the slab lock while unlinking can cause a deadlock: item A unlinks, which locks LRU, then tries to lock SLAB page mover locks SLAB, locks item B, tries to unlink item B if done after A locks LRU, it deadlocks while B tries to lock LRU. doh :/
* extstore: prefill global page pool with extstoredormando2017-12-181-1/+14
| | | | | | | | | | | | the slab page mover algo would fill memory to evicting for a few seconds before jumping to life and shoveling some pages into the global pool. swore I was going to fix this post-release, but I had a moment of inspiration after finding some code from another branch that did half the work. After a bunch of stupid bugs it seems to work. -o slab_automove_freeratio=0.N is now an option. This is *percentage of total memory*, so don't set it too high.
* extstore: C version of automove algorithmdormando2017-12-071-0/+14
| | | | | | | | couple TODO items left for a new issue I thought of. Also hardcoded memory buffer size which should be fixed. also need to change the "free and re-init" logic to use a boolean in case any related option changes.
* extstore: redefine mem_limit_reached for callersdormando2017-12-031-1/+1
| | | | | | uses actual mem limit... may be wrong if slab_reassign isn't enabled :( allows better judgement for flusher.
* extstore: always output global page pooldormando2017-11-281-5/+3
| | | | | | | if we want to disable automove and run an external algo that examines the page pool count... removes temp hack from external script.
* extstore: page mover can decrement storagedormando2017-11-281-3/+10
| | | | | page mover didn't have a reference to the storage object, so items lost during page move transitions wouldn't be decremented from storage.
* external storage base commitdormando2017-11-281-0/+18
| | | | | been squashing reorganizing, and pulling code off to go upstream ahead of merging the whole branch.
* sleep longer between slab move runsdormando2017-06-231-1/+1
| | | | | | | | waits at least a full millisecond before scanning the page again. should hammer a lot less in the background when stuck. could probably up more, but want to keep it relatively aggressive in case of hot memory that it might have to free a few times.
* add a real slab automover algorithmdormando2017-06-231-0/+16
| | | | converts the python script to C, more or less.
* slab_rebal: delete busy items if stuckdormando2017-06-231-1/+21
| | | | | | | | | if we loop through a slab too many times without freeing everything, delete items stuck with high refcounts. they should bleed off so long as the connections aren't jammed holding them. should be possible to force rescues in this case as well, but that's more code so will follow up later. Need a big-ish refactor.
* fix long lock pause in hash expansiondormando2017-06-211-0/+5
| | | | | | | plus a locking fix for slabs reassign. lru maintainer can call into slabs reassign + lru crawler, so we need to pause it before attempting to pause the other two threads.
* fix crash in page mover while using large itemsdormando2017-06-031-2/+2
| | | | also fixes memory tracking bug when releasing memory.
* Spelling fixesJosh Soref2017-05-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * accesses * amount * append * command * cyrillic * daemonize * detaches * detail * documentation * dynamically * enabled * existence * extra * implementations * incoming * increment * initialize * issue * javascript * number * optimization * overall * pipeline * reassign * reclaimed * response * responses * sigabrt * specific * specificity * tidiness
* refactor chunk chaining for memory efficiency1.4.36dormando2017-03-191-111/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory chunk chains would simply stitch multiple chunks of the highest slab class together. If your item was 17k and the chunk limit is 16k, the item would use 32k of space instead of a bit over 17k. This refactor simplifies the slab allocation path and pulls the allocation of chunks into the upload process. A "large" item gets a small chunk assigned as an object header, rather than attempting to inline a slab chunk into a parent chunk. It then gets chunks individually allocated and added into the chain while the object uploads. This solves a lot of issues: 1) When assembling new, potentially very large items, we don't have to sit and spin evicting objects all at once. If there are 20 16k chunks in the tail and we allocate a 1 meg item, the new item will evict one of those chunks inbetween each read, rather than trying to guess how many loops to run before giving up. Very large objects take time to read from the socket anyway. 2) Simplifies code around the initial chunk. Originally embedding data into the top chunk and embedding data at the same time required a good amount of fiddling. (Though this might flip back to embedding the initial chunk if I can clean it up a bit more). 3) Pulling chunks individually means the slabber code can be flatened to not think about chunks aside from freeing them, which culled a lot of code and removed branches from a hot path. 4) The size of the final chunk is naturally set to the remaining about of bytes that need to be stored, which means chunks from another slab class can be pulled to "cap off" a large item, reducing memory overhead.
* stop using atomics for item refcount managementdormando2017-01-221-3/+3
| | | | | | | | | | | | | | when I first split the locks up further I had a trick where "item_remove()" did not require holding the associated item lock. If an item were to be freed, it would then do the necessary work.; Since then, all calls to refcount_incr and refcount_decr only happen while the item is locked. This was mostly due to the slab mover being very tricky with locks. The atomic is no longer needed as the refcount is only ever checked after a lock to the item. Calling atomics is pretty expensive, especially in multicore/multisocket scenarios. This yields a notable performance increase.
* fix over-allocating with large item supportdormando2016-08-111-1/+2
| | | | | -I 2m would still allocate 2mb pages, then only use 1mb of it, halving memory capacity.
* slabs reassigns works with chunks and chunked items.dormando2016-07-121-40/+117
| | | | | | | | | | also fixes the new LRU algorithm to balance by total bytes used rather than total chunks used, since total chunks used isn't tracked for multi-chunk items. also fixes a bug where the lru limit wasn't being utilized for HOT_LRU also some cleanup from previous commits.
* startup options for chunked items.dormando2016-07-121-2/+2
| | | | | | | | | | | | | | | | | has spent some time under performance testing. For larger items there's less than 5% extra CPU usage, however the max usable CPU when using large items is 1/10th or less before you run out of bandwidth. Mixed small/large items will still balance out. comments out debugging (which must be removed for release). restores defaults and ensures only t/chunked-items.t is affected. dyn-maxbytes and item_size_max tests still fail. append/prepend aren't implemented, sasl needs to be guarded. slab mover needs to be updated.
* chunked item second checkpointdormando2016-07-121-3/+8
| | | | | | | | | can actually fetch items now, and fixed a few bugs with storage/freeing. added fetching for binprot. added some basic tests. many tests still fail for various reasons, and append/prepend isn't fixed yet.
* chunked items checkpoint commitdormando2016-07-121-33/+141
| | | | | can set and store large items via asciiprot. gets/append/prepend/binprot not implemented yet.
* clean up global stats code a little.dormando2016-06-271-4/+4
| | | | | | | | | | tons of stats were left in the global stats structure that're no longer used, and it looks like we kept accidentally adding new ones in there. There's also an unused mutex. Split global stats into `stats` and `stats_state`. initialize via memset, reset only `stats` via memset, removing several places where stats values get repeated. Looks much cleaner and should be less error prone.
* cache_memlimit command for tuning runtime maxbytesdormando2016-06-241-0/+38
| | | | | | | | | | Allows dynamically increasing the memory limit of a running system, if memory isn't being preallocated. If `-o modern` is in use, can also dynamically lower memory usage. pages are free()'ed back to the OS via the slab rebalancer as memory is freed up. Does not guarantee the OS will actually give the memory back for other applications to use, that depends on how the OS handles memory.
* allow manually specifying slab class sizesdormando2016-06-241-3/+11
| | | | | | | | | | | | | "-o slab_sizes=100-200-300-400-500" will create 5 slab classes of those specified sizes, with the final class being item_max_size. Using the new online stats sizes command, it's possible to determine if the typical factoral slab class growth rate doesn't align well with how items are stored. This is dangerous unless you really know what you're doing. If your items have an exact or very predictable size this makes a lot of sense. If they do not, the defaults are safer.
* online hang-free "stats sizes" command.dormando2016-06-241-0/+4
| | | | | | | | | | | | | | | "stats sizes" is one of the lack cache-hanging commands. With millions of items it can hang for many seconds. This commit changes the command to be dynamic. A histogram is tracked as items are linked and unlinked from the cache. The tracking is enabled or disabled at runtime via "stats sizes_enable" and "stats sizes_disable". This presently "works" but isn't accurate. Giving it some time to think over before switching to requiring that CAS be enabled. Otherwise the values could underflow if items are removed that existed before the sizes tracker is enabled. This attempts to work around it by using it->time, which gets updated on fetch, and is thus inaccurate.
* bump some global stats to 64bit uintsdormando2016-06-051-2/+2
| | | | total_items is pretty easy to overflow. Upped some of the others just in case.
* fix build with musl libcNatanael Copa2016-05-281-1/+1
| | | | | | | | | | musl libc will warn if you include sys/signal.h instead of signal.h as specified by posix. Build will fail due to -Werror explicitly beeing set. Fix it by use the posix location. fixes #138
* fix over-inflation of total_malloceddormando2015-11-181-1/+1
| | | | | | | | | | | mem_alloced was getting increased every time a page was assigned out of either malloc or the global page pool. This means total_malloced will inflate forever as pages are reused, and once limit_maxbytes is surpassed it will stop attempting to malloc more memory. The result is we would stop malloc'ing new memory too early if page reclaim happens before the whole thing fills. The test already caused this condition, so adding the extra checks was trivial.
* try harder to save itemsdormando2015-11-181-27/+47
| | | | | | previously the slab mover would evict items if the new chunk was within the slab page being moved. now it will do an inline reclaim of the chunk and try until it runs out of memory.
* split rebal_evictions into _nomem and _samepagedormando2015-11-181-6/+10
| | | | | | gross oversight putting two conditions into the same variable. now can tell if we're evicting because we're hitting the bottom of the free memory pool, or if we keep trying to rescue items into the same page as the one being cleared.
* stop using slab class 255 for page moverdormando2015-11-181-6/+8
| | | | | | | | | | | | | class 255 is now a legitimate class, used by the NOEXP LRU when the expirezero_does_not_evict flag is enabled. Instead, we now force a single bit ITEM_SLABBED when a chunk is returned to the slabber, and ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move. item_alloc overwrites the chunk's flags on set. The only weirdness was slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in 2003 titled "more debugging" and can't come up with a good enough excuse for preserving an item's flags when it's been returned to the free memory pool. So now we overload the flag meaning.
* call STATS_LOCK() less in slab mover.dormando2015-11-181-12/+14
| | | | | uses the slab_rebal struct to summarize stats, more occasionally grabbing the global lock to fill them in, instead.
* "mem_requested" from "stats slabs" is now accuratedormando2015-11-181-3/+4
| | | | | | During an item rescue, item size was being added to the slab class when the new chunk requested, and then not removed again from the total if the item was successfully rescued. Now just always remove from the total.