| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
We want to start using cache commands in contexts without a client
connection, but the client object has always been passed to all
functions.
In most cases we only need the worker thread (LIBEVENT_THREAD *t), so
this change adjusts the arguments passed in.
|
|
|
|
|
| |
allow users to differentiate thread functions externally to memcached.
Useful for setting priorities or pinning threads to CPU's.
|
|
|
|
|
|
|
|
| |
When allocating sub-max chunks for the tail end of a large item the
allocator would only look at the exact slab class. If items in a cache
are all exclusively large, these slab classes could be empty. Now as a
fallback it will also check and evict from the largest slab class even
if it doeesn't necessarily want the largest chunk.
|
|
|
|
|
| |
fixes failing tests and scenarios where a lot of memory is freed up at
once.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extstore has a background thread which examines slab classes for items
to flush to disk. The thresholds for flushing to disk are managed by a
specialized "slab automove" algorithm. This algorithm was written in
2017 and not tuned since.
Most serious users set "ext_item_age=0" and force flush all items. This
is partially because the defaults do not flush aggressively enough,
which causes memory to run out and evictions to happen.
This change simplifies the slab automove portion. Instead of balancing
free chunks of memory per slab class, it sets a target of a certain
number of free global pages.
The extstore flusher thread also uses the page pool and some low chunk
limits to decide when to start flushing. Its sleep routines have also
been adjusted as it could oversleep too easily.
A few other small changes were required to avoid over-moving slab pages
around.
|
|
|
|
|
|
|
|
|
|
|
| |
- Skip using crawler items when calculating the automover age stats as
they can severly skew the ages in the stats to the point of completely
starving particular slabs
- Include the current window data in the window sum so we don't free
pages that are actually needed - this also matches the python script
behavior
- Reset young / old when interrupting the automove decision loop so we
don't accidentally move things which we didn't mean to
|
|
|
|
|
|
| |
This adds a new field `size` to logger entry lines for item_get,
item_store, and eviction events indicating the size of the associated
item in bytes.
|
|
|
|
|
| |
cachedump was the only place in the codebase I can find which copied
the key verbatim. wonder when I can finally remove the command :)
|
|
|
|
|
|
|
| |
extstore.h is now only used from storage.c. starting a path towards
getting the storage interface to be more generalized.
should be no functional changes.
|
|
|
|
|
|
|
|
|
| |
- `slabs_rebalance_lock`
- `slab_rebalance_cond`
- `maintenance_lock`
- `lru_crawler_lock`
- `lru_crawler_cond`
- `lru_maintainer_lock`
|
|
|
|
|
|
|
|
|
|
|
|
| |
The last access time used to only update once per minute to avoid
excess bumping on hot items. However, with segmented mode if an item is
hit a lot it's simply poked in place.
Previous to this change we were calling extra functions and branches
for no real reason. Also when bumping within the WARM_LRU, we were
updating the last access time despite it being a shuffle. Also it was
skipping the bump if the access time was too recent, which is one hell
of a bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- we get asked a lot to provide a "metaget" command, for various uses
(debugging, etc)
- we also get asked for random one-off commands for various use cases.
- I really hate both of these situations and have been wanting to
experiment with a slight tuning of how get commands work for a long
time.
Assuming that if I offer a metaget command which gives people the
information they're curious about in an inefficient format, plus data
they don't need, we'll just end up with a slow command with
compatibility issues. No matter how you wrap warnings around a command,
people will put it into production under high load. Then I'm stuck with
it forever.
Behold, the meta commands!
See doc/protocol.txt and the wiki for a full explanation and examples.
The intent of the meta commands is to support any features the binary
protocol had over the text protocol. Though this is missing some
commands still, it is close and surpasses the binary protocol in many
ways.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"-e /path/to/tmpfsmnt/file"
SIGUSR1 for graceful stop
restart requires the same memory limit, slab sizes, and some other
infrequently changed details. Most other options and features can
change between restarts. Binary can be upgraded between restarts.
Restart does some fixup work on start for every item in cache. Can take
over a minute with more than a few hundred million items in cache.
Keep in mind when a cache is down it may be missing invalidations,
updates, and so on.
|
| |
|
|
|
|
| |
you can now monitor fetch and mutations of a given client
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mem_requested is an oddball counter: it's the total number of bytes
"actually requested" from the slab's caller. It's mainly used for a
stats counter, alerting the user that the slab factor may not be
efficient if the gap between total_chunks * chunk_size - mem_requested
is large.
However, since chunked items were added it's _also_ used to help the
LRU balance itself. The total number of bytes used in the class vs the
total number of bytes in a sub-LRU is used to judge whether to move
items between sub-LRU's.
This is a layer violation; forcing slabs.c to know more about how items
work, as well as EXTSTORE for calculating item sizes from headers.
Further, it turns out it wasn't necessary for item allocation: if we
need to evict an item we _always_ pull from COLD_LRU or force a move
from HOT_LRU. So the total doesn't matter.
The total does matter in the LRU maintainer background thread. However,
this thread caches mem_requested to avoid hitting the slab lock too
frequently. Since sizes_bytes[] within items.c is generally redundant
with mem_requested, we now total sizes_bytes[] from each sub-LRU before
starting a batch of LRU juggles.
This simplifies the code a bit, reduces the layer violations in slabs.c
slightly, and actually speeds up some hot paths as a number of branches
and operations are removed completely.
This also fixes an issue I was having with the restartable memory
branch :) recalculating p->requested and keeping a clean API is painful
and slow.
NOTE: This will vary a bit compared to what mem_requested originally
did, mostly for large chunked items.
For items which fit inside a single slab chunk, the stat is identical.
However, items constructed by chaining chunks will have a single large
"nbytes" value and end up in the highest slab class. Chunked items can
be capped with chunks from smaller slab classes; you will see
utilization of chunks but not an increase in mem_requested for them.
I'm still thinking this through but this is probably acceptable. Large
chunked items should be accounted for separately, perhaps with some new
counters so they can be discounted from normal calculations.
|
| |
|
|
|
|
|
|
|
|
|
| |
did a weird dance. nsuffix is no longer an 8bit length, replaced with
ITEM_CFLAGS bit. This indicates whether there is a 32bit set of
client flags in the item or not.
possible after removing the inlined ascii response header via previous
commit.
|
|
|
|
|
|
| |
Has defaulted to false since 1.5.0, and with -o modern for a few years
before that. Performance is fine, no reported bugs. Always was the
intention. Code is simpler without the options.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
some whackarse ARM platforms on specific glibc/gcc (new?) versions trip
SIGBUS while reading the header chunk for a split item.
the header chunk is unfortunate magic: It lives in ITEM_data() at a random
offset, is zero sized, and only exists to simplify code around finding the
orignial slab class, and linking/relinking subchunks to an item.
there's no fix to this which isn't a lot of code. I need to refactor chunked
items, and attempted to do so, but couldn't come up with something I liked
quickly enough.
This change pads the first chunk if alignment is necessary, which wastes
bytes and a little CPU, but I'm not going to worry a ton for these obscure
platforms.
this works with rebalancing because in the case of ITEM_CHUNKED header, it
treats the item size as the size of the class it resides in, and memcpy's the
item during recovery.
all other cases were changes from ITEM_data to a new ITEM_schunk() inline
function that is created when NEED_ALIGN is set, else it's equal to ITEM_data
still.
|
|
|
|
|
|
|
|
|
|
|
| |
trying out a simplified slab class backoff algorithm. The LRU maintainer
individually schedules slab classes by time, which leads to multiple wakeups
in a steady state as they get out of sync. This algorithm more simply skips
that class more often each time it runs the main loop, using a single
scheduled sleep instead.
if it goes to sleep for a long time, it also reduces the backoff for all
classes. if we're barely awake it should be fine to poke everything.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LRU crawler metadumper is used for getting snapshot-y looks at the LRU's.
Since there's no default limit, it'll get any new items added or bumped since
the roll started.
with this change it limits the number of items dumped to the number that
existed in that LRU when the roll was kicked off. You still end up with an
approximation, but not a terrible one:
- items bumped after the crawler passes them likely won't be revisited
- items bumped before the crawler passes them will likely be visited toward
the end, or mixed with new items.
- deletes are somewhere in the middle.
|
|
|
|
|
| |
items expired/evicted while pulling from tail weren't being tracked, leading
to a leak of object counts in pages.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
was early evicting from HOT/WARM LRU's for item headers because the
*original* item size was being tracked, then compared to the actual byte
totals for the class.
also adjusts drop_unread so it drops items which are currently in the COLD_LRU
this is expected to be used with very low compacat_under values; ie 2-5
depending on page count and write load. If you can't defrag-compact,
drop-compact.
but this is still subtly wrong, since drop_compact is now an option.
|
|
|
|
|
|
|
|
| |
couple TODO items left for a new issue I thought of. Also hardcoded memory
buffer size which should be fixed.
also need to change the "free and re-init" logic to use a boolean in case any
related option changes.
|
|
|
|
|
| |
./configure --enable-extstore to compile the feature in
specify -o ext_path=/whatever to start.
|
|
|
|
|
| |
been squashing reorganizing, and pulling code off to go upstream ahead
of merging the whole branch.
|
|
|
|
|
|
|
|
|
| |
use slightly more modern syntax :P
Wasn't really a bug since it'd just sleep too much for a bit or cap it back
to the MAX. This will give more consistent behavior though.
Thanks to shqking on github for the report
|
|
|
|
| |
used in separate file for flash branch.
|
|
|
|
|
| |
removes a few ifdef's and upstreams small internal interface tweaks for easy
rebase.
|
|
|
|
| |
plumbing for doing inline reclaim, or similar.
|
|
|
|
|
| |
was defaulting to HOT, but HOT can be empty pretty easily and this can be
confusing.
|
|
|
|
|
|
| |
too used to thread stacks being several megabytes, maybe :) The crawlerstats
gets a bit big so do a normal memory allocation for it. seems to work, but I
can't run tests on musl without making the debug binary build.
|
| |
|
|
|
|
|
| |
If the size of the flags are 0, it can easily mean to not store anything at
all.
|
|
|
|
|
|
|
|
|
| |
defaults at 20% of COLD age.
hot_max_age was added because many people's caches were sitting at 32% memory
utilized (exactly the size of hot). Capping the LRU's by percentage and age
would promote some fairness, but I made a mistake making WARM dynamic but HOT
static. This is now fixed.
|
|
|
|
| |
converts the python script to C, more or less.
|
|
|
|
|
| |
if doing a lot of bumps within WARM LRU, the thread can start to sleep more
often because it thinks it's not completing any work.
|
|
|
|
|
|
|
|
|
|
|
| |
under enough set pressure some slab classes may never complete scanning, as
there's always something new at the top.
this is a quick workaround for the internal scanner. always use a limit seeded
at the size of the largest class. smaller classes will simply finish sooner.
needs a better fix for the user-based commands. change of the API would allow
for per-crawler tocrawl values.
|
|
|
|
|
|
|
|
| |
no actual speed loss. emulate the slab_stats "get_hits" by totalling up the
per-LRU get_hits.
could sub-LRU many stats but should use a different command/interface for
that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when trying to manually run a crawl, the internal autocrawler is now blocked
from restarting for 60 seconds.
the internal autocrawl now independently schedules LRU's, and can re-schedule
sub-LRU's while others are still running. should allow much better memory
control when some sub-lru's (such as TEMP or WARM) are small, or slab classes
are differently sized.
this also makes the crawler drop its lock frequently.. this fixes an issue
where a long crawl happening at the same time as a hash table expansion could
hang the server until the crawl finished.
to improve still:
- elapsed time can be wrong in the logger entry
- need to cap number of entries scanned. enough set pressure and a crawl may
never finish.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
added a bug which caused LRU juggler to never sleep. increased max sleep time
to 1s.
also fixed a bug where every other LRU round had a 0 sleep.
there's still wakeup overkill once one slab class becomes active, it will
"desync" from the amount of sleep required for other slab classes. They will
pull the LRU once per second, but the thread wakes up, up to the number of
active slab classes per second.
deprioritizing, but a clean way of re-syncing the slab classes would minimize
wakeups.
|
|
|
|
|
|
|
|
|
|
|
| |
The logger and lru maintainer threads both adjust how long they sleep
based on how busy they are. This adjustment should be exponential, to
more quickly adjust to workloads.
The logger thread in particular would only adjust 50 microseconds at a
time, and was capped at 100 milliseconds of sleep, causing many
unnecessary wakeups on an otherwise idle dev machine. Adjust this cap to
1 second.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory chunk chains would simply stitch multiple chunks of the highest slab
class together. If your item was 17k and the chunk limit is 16k, the item
would use 32k of space instead of a bit over 17k.
This refactor simplifies the slab allocation path and pulls the allocation of
chunks into the upload process. A "large" item gets a small chunk assigned as
an object header, rather than attempting to inline a slab chunk into a parent
chunk. It then gets chunks individually allocated and added into the chain
while the object uploads.
This solves a lot of issues:
1) When assembling new, potentially very large items, we don't have to sit and
spin evicting objects all at once. If there are 20 16k chunks in the tail and
we allocate a 1 meg item, the new item will evict one of those chunks
inbetween each read, rather than trying to guess how many loops to run before
giving up. Very large objects take time to read from the socket anyway.
2) Simplifies code around the initial chunk. Originally embedding data into
the top chunk and embedding data at the same time required a good amount of
fiddling. (Though this might flip back to embedding the initial chunk if I can
clean it up a bit more).
3) Pulling chunks individually means the slabber code can be flatened to not
think about chunks aside from freeing them, which culled a lot of code and
removed branches from a hot path.
4) The size of the final chunk is naturally set to the remaining about of
bytes that need to be stored, which means chunks from another slab class can
be pulled to "cap off" a large item, reducing memory overhead.
|
|
|
|
| |
takes a long time for mc-crusher to find this. I didn't run it long enough :(
|
| |
|
|
|
|
|
|
|
|
| |
If LRU maintainer thread is started, this allows you to switch between "flat"
and "segmented" modes at runtime. The maintainer thread will drain HOT/WARM
LRU's if put into flat mode, and no new items should fill in.
This was much easier than expected...
|