summaryrefslogtreecommitdiff
path: root/storage.c
Commit message (Collapse)AuthorAgeFilesLines
* extstore: increase aggressiveness of flush threadHEAD1.6.20nextmasterdormando2023-05-111-8/+15
| | | | | Thread wasn't keeping up in high load scenarios with low/default free ratios.
* core: simplify background IO APIdormando2023-01-111-23/+26
| | | | | | | | - removes unused "completed" IO callback handler - moves primary post-IO callback handlers from the queue definition to the actual IO objects. - allows IO object callbacks to be handled generically instead of based on the queue they were submitted from.
* core: give threads unique namesdormando2022-11-011-0/+2
| | | | | allow users to differentiate thread functions externally to memcached. Useful for setting priorities or pinning threads to CPU's.
* extstore: make defaults more aggressivedormando2022-08-251-10/+10
| | | | | | | | | | | | | | | | | | | | | | extstore has a background thread which examines slab classes for items to flush to disk. The thresholds for flushing to disk are managed by a specialized "slab automove" algorithm. This algorithm was written in 2017 and not tuned since. Most serious users set "ext_item_age=0" and force flush all items. This is partially because the defaults do not flush aggressively enough, which causes memory to run out and evictions to happen. This change simplifies the slab automove portion. Instead of balancing free chunks of memory per slab class, it sets a target of a certain number of free global pages. The extstore flusher thread also uses the page pool and some low chunk limits to decide when to start flushing. Its sleep routines have also been adjusted as it could oversleep too easily. A few other small changes were required to avoid over-moving slab pages around.
* storage: parameterize the compaction thread sleepdormando2022-02-211-9/+20
| | | | | | allows tests to run faster, let users make it sleep longer/less time. Also cuts the sleep time down when actively compacting and coming from high idle.
* Fix time-of-check time-of-use bugskokke2021-11-231-1/+1
| | | Fixing 'bugs' of the pattern: 'assert(ptr != 0)' after 'ptr' was already dereferenced
* core: io_queue flow second attemptdormando2021-08-091-8/+11
| | | | | | | | | | | probably squash into previous commit. io->c->thead can change for orpahned IO's, so we had to directly add the original worker thread as a reference. also tried again to split callbacks onto the thread and off of the connection for similar reasons; sometimes we just need the callbacks, sometimes we need both.
* core: io_queue_t flow modedormando2021-08-091-4/+7
| | | | | | | | | | | instead of passing ownership of (io_queue_t)*q to the side thread, instead the ownership of IO objects are passed to the side thread, which are then individually returned. The worker thread runs return_cb() on each, determining when it's done with the response batch. this interface could use more explicit functions to make it more clear. Ownership of *q isn't actually "passed" anywhere, it's just used or not used depending on which return function the owner wants.
* extstore: fix crash on 'stats extstore'dormando2021-06-071-0/+3
| | | | if extstore wasn't enabled, crashes. Reported by @zer0e on github.
* queue: replace c->io_pending to avoid a mutexdormando2020-10-301-8/+6
| | | | | | since multiple queues can be sent to different sidethreads, we need a new mechanism for knowing when to return everything. In the common case only one queue will be active, so adding a mutex would be excessive.
* core: restructure IO queue callbacksdormando2020-10-301-18/+22
| | | | | | | | | | | | | | mc_resp is the proper owner of a pending IO once it's been initialized; release it during resp_finish(). Also adds a completion callback which runs on the submitted stack after returning to the worker thread but before the response is transmitted. allows re-queueing for pending IO if processing a response generates another pending IO. also allows a further refactor to run more extstore code on the worker thread instead of the IO threads. uses proper conn_io_queue state to describe connections waiting for pending IO's.
* core: io_pending_t is an embeddable structdormando2020-10-301-30/+52
| | | | | | reserve space in an io_pending_t. users cast it to a more specific structure, avoiding extra allocations for local data. In this case what might require 3 allocs stays as just 1.
* core: move more storage functions to storage.cdormando2020-10-301-1/+618
| | | | | | | extstore.h is now only used from storage.c. starting a path towards getting the storage interface to be more generalized. should be no functional changes.
* core: generalize extstore's defered IO queuedormando2020-10-301-0/+84
| | | | | | | | | | | | want to reuse the deferred IO system for extstore for something else. Should allow evolving into a more plugin-centric system. step one of three(?) - replace in place and tests pass with extstore enabled. step two should move more extstore code into storage.c step three should build the IO queue code without ifdef gating.
* extstore: fix some valgrind errors.dormando2020-04-111-0/+1
|
* Add stdio.h,stddef.h to storage.cminkikim892020-03-091-0/+2
|
* move mem_requested from slabs.c to items.cdormando2019-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_requested is an oddball counter: it's the total number of bytes "actually requested" from the slab's caller. It's mainly used for a stats counter, alerting the user that the slab factor may not be efficient if the gap between total_chunks * chunk_size - mem_requested is large. However, since chunked items were added it's _also_ used to help the LRU balance itself. The total number of bytes used in the class vs the total number of bytes in a sub-LRU is used to judge whether to move items between sub-LRU's. This is a layer violation; forcing slabs.c to know more about how items work, as well as EXTSTORE for calculating item sizes from headers. Further, it turns out it wasn't necessary for item allocation: if we need to evict an item we _always_ pull from COLD_LRU or force a move from HOT_LRU. So the total doesn't matter. The total does matter in the LRU maintainer background thread. However, this thread caches mem_requested to avoid hitting the slab lock too frequently. Since sizes_bytes[] within items.c is generally redundant with mem_requested, we now total sizes_bytes[] from each sub-LRU before starting a batch of LRU juggles. This simplifies the code a bit, reduces the layer violations in slabs.c slightly, and actually speeds up some hot paths as a number of branches and operations are removed completely. This also fixes an issue I was having with the restartable memory branch :) recalculating p->requested and keeping a clean API is painful and slow. NOTE: This will vary a bit compared to what mem_requested originally did, mostly for large chunked items. For items which fit inside a single slab chunk, the stat is identical. However, items constructed by chaining chunks will have a single large "nbytes" value and end up in the highest slab class. Chunked items can be capped with chunks from smaller slab classes; you will see utilization of chunks but not an increase in mem_requested for them. I'm still thinking this through but this is probably acceptable. Large chunked items should be accounted for separately, perhaps with some new counters so they can be discounted from normal calculations.
* remove inline_ascii_response optiondormando2019-05-201-1/+1
| | | | | | Has defaulted to false since 1.5.0, and with -o modern for a few years before that. Performance is fine, no reported bugs. Always was the intention. Code is simpler without the options.
* expand NEED_ALIGN for chunked itemsdormando2018-08-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | some whackarse ARM platforms on specific glibc/gcc (new?) versions trip SIGBUS while reading the header chunk for a split item. the header chunk is unfortunate magic: It lives in ITEM_data() at a random offset, is zero sized, and only exists to simplify code around finding the orignial slab class, and linking/relinking subchunks to an item. there's no fix to this which isn't a lot of code. I need to refactor chunked items, and attempted to do so, but couldn't come up with something I liked quickly enough. This change pads the first chunk if alignment is necessary, which wastes bytes and a little CPU, but I'm not going to worry a ton for these obscure platforms. this works with rebalancing because in the case of ITEM_CHUNKED header, it treats the item size as the size of the class it resides in, and memcpy's the item during recovery. all other cases were changes from ITEM_data to a new ITEM_schunk() inline function that is created when NEED_ALIGN is set, else it's equal to ITEM_data still.
* extstore JBOD supportdormando2018-08-061-2/+89
| | | | | | | | | | | | | | | | Just a Bunch Of Devices :P code exists for routing specific devices to specific buckets (lowttl/compact/etc), but enabling it requires significant fixes to compaction algorithm. Thus it is disabled as of this writing. code cleanups and future work: - pedantically freeing memory and closing fd's on exit - unify and flatten the free_bucket code - defines for free buckets - page eviction adjustment (force min-free per free bucket) - fix default calculation for compact_under and drop_under - might require forcing this value only on default bucket
* split storage writer into its own threaddormando2018-08-031-13/+125
| | | | | | | | | | | trying out a simplified slab class backoff algorithm. The LRU maintainer individually schedules slab classes by time, which leads to multiple wakeups in a steady state as they get out of sync. This algorithm more simply skips that class more often each time it runs the main loop, using a single scheduled sleep instead. if it goes to sleep for a long time, it also reduces the backoff for all classes. if we're barely awake it should be fine to poke everything.
* add utility macro to replace repetitive code snippit.Linkerist2018-06-271-8/+1
|
* alignment and 32bit fixes for extstoredormando2018-05-221-3/+3
| | | | | | | | memory alignment when reading header data back. left "32" in a few places that should've at least been a define, is now properly an offsetof. used for skipping crc32 for dynamic parts of the item headers.
* Spelling fixesJosh Soref2018-03-141-1/+1
| | | | | | | | | | * automover * avoiding * compress * fails * successfully * success * tidiness
* extstore: check malloc in compaction thread.dormando2017-12-181-2/+5
| | | | simple change.
* extstore: tuning drop_unread semanticsdormando2017-12-151-5/+6
| | | | | | | | | | | | | there's now an optional ext_drop_under setting which defaults to the same as compact_under, which should be fine. now, if drop_unread is enabled, it only kicks in if there are no pages matching the compaction threshold. This allows you to set a lower compaction frag rate, then start rescuing only non-COLD items if storage is too full. You can also compact up to a point, then allow a buffer of pages to be used before dropping unread. previously enabling drop_unread would always drop_unread even when compacting normally. This limited utility of the feature.
* extstore: add evictions-related write logsdormando2017-12-151-0/+1
| | | | | | "watch evictions" will show a stream of evictions + writes to extstore. useful for analyzing the remaining ttl or key pattern of stuff being flushed.
* extstore: fix crash while moving extstore chunksdormando2017-12-141-0/+1
| | | | | | | | | | | | | | | ITEM_LINKED was still set on the objects being written to disk. If a page being moved contains a chunk read from extstore currently being written back to the client, it will mistake it for a chunk properly linked and attempt to unlink if if the page has also been jammed in the mover. So you need a cross section of a particular chunk being active the entire time a page is jammed, then it tries to unlink it but the header is partially garbage and it segfaults. The page mover ignores all items which don't either have LINKED or SLABBED, assuming they're in transition. So the fix is to simply remove the LINKED bit after copying into the write buffer.
* extstore: oh god whydormando2017-12-111-0/+1
| | | | I was so sure I'd triggered this with tests...
* extstore: fix size tracking and adjust drop_unreaddormando2017-12-081-1/+13
| | | | | | | | | | | | | | was early evicting from HOT/WARM LRU's for item headers because the *original* item size was being tracked, then compared to the actual byte totals for the class. also adjusts drop_unread so it drops items which are currently in the COLD_LRU this is expected to be used with very low compacat_under values; ie 2-5 depending on page count and write load. If you can't defrag-compact, drop-compact. but this is still subtly wrong, since drop_compact is now an option.
* extstore: experimental per-class free chunk limitdormando2017-12-031-3/+2
| | | | | | | | | | | | | | | | | | | external comands only for the moment. allows specifying per-slab-class how many chunks to leave free before causing flushing to storage. the external page mover algo in previous commits has a few issues: - relies too heavily on page mover. lots of constant activity under load. - adjusting the item age level on the fly is too laggy, and can easily over-frefree or under-free. IE; class 3 has TTL 90, but class 4 has TTL 60 and most of the pages in memory, it won't free much until it lowers to 60. Thinking this would be something like; % of total chunks in slab class. easiest to set as a percentage of total memory or by write rate periodically. from there TTL can be balanced almost as in the original algorithm; keep a small global page pool for small items to allocate memory from, and pull pages from or balance between storage-capable classes to align TTL.
* extstore: ext_compact_under to control compactiondormando2017-11-281-1/+1
| | | | | | | | had a hardcoded value of "start to compact under a slew if more than 3/4ths of pages are used", but this allows it to be set directly. ie; "I have 100 pages but don't want to compact util almost full, and then drop any unread"
* extstore: add ext_drop_unread option + live tunedormando2017-11-281-7/+4
| | | | | | | | | | | | was struggling to figure out how to automatically turn this on or off, but I think it should be part of an outside process. ie; a mechanism should be able to target a specific write rate, and one of its tools for reducing the write rate should be flipping this on. there's *still* a hole where you can't trigger a compaction attempt if there's no fragmentation. I kind of want, if this feature is on, to attempt a compaction on the oldest page while dropping unread items.
* extstore: crawler fix and ext_low_ttl optiondormando2017-11-281-2/+8
| | | | | | | | | | | | LRU crawler was not marking reclaimed expired items as removed from the storage engine. This could cause fragmentation to persist much longer than it should, but would not cause any problems once compaction started. Adds "ext_low_ttl" option. Items with a remaining expiration age below this value are grouped into special pages. If you have a mixed TTL workload this would help prevent low TTL items from causing excess fragmentation/compaction. Pages with low ttl items are excluded from compaction.
* extstore: fix a few more missing ifdefsdormando2017-11-281-0/+3
|
* extstore: pause compaction thread with hash expanddormando2017-11-281-0/+14
| | | | could potentially cause weirdness when the hash table is swapped.
* extstore: minor bugfixesdormando2017-11-281-1/+1
| | | | | | | | | | refuse to start if inline_ascii_resp is enabled, due to it breaking the item header objects. actually use iovst value passed during binprot requests make the item flags converters the same (strtoul can eat leading space). need to replace them with a function still.
* extstore: support chunked items.dormando2017-11-281-57/+86
| | | | | | item size max must be <= wbuf_size. reads into iovecs, writes out of same iovecs.
* extstore: skip unhit objects if full in compactiondormando2017-11-281-7/+24
| | | | | | if < 2 free pages left, "evict" objects which haven't been hit at all. should be better than evicting everything if we can continue compacting.
* extstore: split write into write_request+writedormando2017-11-281-16/+12
| | | | | | | | write_request returns a buffer to write into, which lets us not corrupt the active item with the hash and crc. "technically" we can save 24 bytes per item in storage but I'll leave that for a later optimization, in case we want to stuff more data into the header.
* external storage base commitdormando2017-11-281-0/+379
been squashing reorganizing, and pulling code off to go upstream ahead of merging the whole branch.