| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"-e /path/to/tmpfsmnt/file"
SIGUSR1 for graceful stop
restart requires the same memory limit, slab sizes, and some other
infrequently changed details. Most other options and features can
change between restarts. Binary can be upgraded between restarts.
Restart does some fixup work on start for every item in cache. Can take
over a minute with more than a few hundred million items in cache.
Keep in mind when a cache is down it may be missing invalidations,
updates, and so on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mem_requested is an oddball counter: it's the total number of bytes
"actually requested" from the slab's caller. It's mainly used for a
stats counter, alerting the user that the slab factor may not be
efficient if the gap between total_chunks * chunk_size - mem_requested
is large.
However, since chunked items were added it's _also_ used to help the
LRU balance itself. The total number of bytes used in the class vs the
total number of bytes in a sub-LRU is used to judge whether to move
items between sub-LRU's.
This is a layer violation; forcing slabs.c to know more about how items
work, as well as EXTSTORE for calculating item sizes from headers.
Further, it turns out it wasn't necessary for item allocation: if we
need to evict an item we _always_ pull from COLD_LRU or force a move
from HOT_LRU. So the total doesn't matter.
The total does matter in the LRU maintainer background thread. However,
this thread caches mem_requested to avoid hitting the slab lock too
frequently. Since sizes_bytes[] within items.c is generally redundant
with mem_requested, we now total sizes_bytes[] from each sub-LRU before
starting a batch of LRU juggles.
This simplifies the code a bit, reduces the layer violations in slabs.c
slightly, and actually speeds up some hot paths as a number of branches
and operations are removed completely.
This also fixes an issue I was having with the restartable memory
branch :) recalculating p->requested and keeping a clean API is painful
and slow.
NOTE: This will vary a bit compared to what mem_requested originally
did, mostly for large chunked items.
For items which fit inside a single slab chunk, the stat is identical.
However, items constructed by chaining chunks will have a single large
"nbytes" value and end up in the highest slab class. Chunked items can
be capped with chunks from smaller slab classes; you will see
utilization of chunks but not an increase in mem_requested for them.
I'm still thinking this through but this is probably acceptable. Large
chunked items should be accounted for separately, perhaps with some new
counters so they can be discounted from normal calculations.
|
|
|
|
|
|
|
|
|
|
|
|
| |
the slab page mover algo would fill memory to evicting for a few seconds
before jumping to life and shoveling some pages into the global pool.
swore I was going to fix this post-release, but I had a moment of inspiration
after finding some code from another branch that did half the work. After a
bunch of stupid bugs it seems to work.
-o slab_automove_freeratio=0.N is now an option. This is *percentage of total
memory*, so don't set it too high.
|
|
|
|
|
|
|
|
| |
couple TODO items left for a new issue I thought of. Also hardcoded memory
buffer size which should be fixed.
also need to change the "free and re-init" logic to use a boolean in case any
related option changes.
|
|
|
|
|
| |
page mover didn't have a reference to the storage object, so items lost
during page move transitions wouldn't be decremented from storage.
|
|
|
|
| |
converts the python script to C, more or less.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory chunk chains would simply stitch multiple chunks of the highest slab
class together. If your item was 17k and the chunk limit is 16k, the item
would use 32k of space instead of a bit over 17k.
This refactor simplifies the slab allocation path and pulls the allocation of
chunks into the upload process. A "large" item gets a small chunk assigned as
an object header, rather than attempting to inline a slab chunk into a parent
chunk. It then gets chunks individually allocated and added into the chain
while the object uploads.
This solves a lot of issues:
1) When assembling new, potentially very large items, we don't have to sit and
spin evicting objects all at once. If there are 20 16k chunks in the tail and
we allocate a 1 meg item, the new item will evict one of those chunks
inbetween each read, rather than trying to guess how many loops to run before
giving up. Very large objects take time to read from the socket anyway.
2) Simplifies code around the initial chunk. Originally embedding data into
the top chunk and embedding data at the same time required a good amount of
fiddling. (Though this might flip back to embedding the initial chunk if I can
clean it up a bit more).
3) Pulling chunks individually means the slabber code can be flatened to not
think about chunks aside from freeing them, which culled a lot of code and
removed branches from a hot path.
4) The size of the final chunk is naturally set to the remaining about of
bytes that need to be stored, which means chunks from another slab class can
be pulled to "cap off" a large item, reducing memory overhead.
|
|
|
|
|
|
|
|
|
|
| |
also fixes the new LRU algorithm to balance by total bytes used rather than
total chunks used, since total chunks used isn't tracked for multi-chunk
items.
also fixes a bug where the lru limit wasn't being utilized for HOT_LRU
also some cleanup from previous commits.
|
|
|
|
|
|
|
|
|
|
| |
Allows dynamically increasing the memory limit of a running system, if memory
isn't being preallocated.
If `-o modern` is in use, can also dynamically lower memory usage. pages are
free()'ed back to the OS via the slab rebalancer as memory is freed up. Does
not guarantee the OS will actually give the memory back for other applications
to use, that depends on how the OS handles memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"-o slab_sizes=100-200-300-400-500" will create 5 slab classes of those
specified sizes, with the final class being item_max_size.
Using the new online stats sizes command, it's possible to determine if the
typical factoral slab class growth rate doesn't align well with how items are
stored.
This is dangerous unless you really know what you're doing. If your items have
an exact or very predictable size this makes a lot of sense. If they do not,
the defaults are safer.
|
|
|
|
|
|
| |
previously the slab mover would evict items if the new chunk was within the
slab page being moved. now it will do an inline reclaim of the chunk and try
until it runs out of memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If any slab classes have more than two pages worth of free chunks, attempt to
free one page back to a global pool.
Create new concept of a slab page move destination of "0", which is a global
page pool. Pages can be re-assigned out of that pool during allocation.
Combined with item rescuing from the previous patch, we can safely shuffle
pages back to the reassignment pool as chunks free up naturally. This should
be a safe default going forward. Users should be able to decide to free or
move pages based on eviction pressure as well. This is coming up in another
commit.
This also fixes a calculation of the NOEXP LRU size, and completely removes
the old slab automover thread. Slab automove decisions will now be part of the
lru maintainer thread.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only way to do eviction case fast enough is to inline it, sadly.
This finally deletes the old item_alloc code now that I'm not intending on
reusing it.
Also removes the condition wakeup for the background thread. Instead runs on a
timer, and meters its aggressiveness by how much shuffling is going on.
Also fixes a segfault in lru_pull_tail(), was unlinking `it` instead of
`search`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basics work, but tests still do not pass.
A background thread wakes up once per second, or when signaled. It is signaled
if a slab class gets an allocation request and has fewer than N chunks free.
The background thread shuffles LRU's: HOT, WARM, COLD. HOT is where new items
exist. HOT and WARM flow into COLD. Active items in COLD flow back to WARM.
Evictions are pulled from COLD.
item_update's no longer do anything (and need to be fixed to tick it->time).
Items are reshuffled within or around LRU's as they reach the bottom.
Ratios of HOT/WARM memory are hardcoded, as are the low/high watermarks.
Thread is not fast enough right now, sets cannot block on it.
|
|
|
|
|
|
|
|
|
|
| |
expansion requires switching to a global lock temporarily, so all buckets have
a covered read lock.
slab rebalancer is paused during hash table expansion.
internal item "trylocks" are always issued, and tracked as the hash power
variable can change out from under it.
|
|
|
|
|
|
|
| |
slab memory assignment used to lazily split a new page into chunks as memory
was requested. now it doesn't, so drop all the related code.
Cuts the memory assignment hotpath a tiny bit, so that's exciting.
|
|
|
|
|
| |
Add human parseable strings to the errors for slabs ressign. Also prevent
reassigning memory to the same source and destination.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a "slabs reassign src dst" manual command, and a thread to safely process
slab moves in the background.
- slab freelist is now a linked list, reusing the item structure
- is -o slab_reassign is enabled, an extra background thread is started
- thread attempts to safely free up items when it's been told to move a page
from one slab to another.
-o slab_automove is stubbed.
There are some limitations. Most notable is that you cannot repeatedly move
pages around without first having items use up the memory. Slabs with newly
assigned memory work off of a pointer, handing out chunks individually. We
would need to change that to quickly split chunks for all newly assigned pages
into that slabs freelist.
Further testing is required to ensure such is possible without impacting
performance.
|
| |
|
|
|
|
|
|
|
|
| |
Old code was unfinished, had no test coverage, and not quite what we'll end up
with in the future.
Slab reassignment will happen in earnest soon, but for now we should stop
confusing users.
|
|
|
|
|
| |
(dustin) I made some changes to the original growth code to pass in
the required size.
|
| |
|
|
|
|
|
|
|
|
| |
The subcommand is not necessarily a null terminated string (in the
case of the binary protocol, it certainly isn't). The subcommand
would not be recognized due to strcmp() failing.
Tested with both text and binary protocols.
|
|
|
|
| |
wasteful function calls (more to come).
|
|
|
|
| |
abstraction by the callback.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
This + previous patch slightly reduce user CPU time, especially during heavy evictions.
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@739 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
|
|
|
|
|
| |
slabs_alloc() internally calls slabs_clsid(), so an eviction case would crawl the list of slab classes three times.
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@738 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
|
|
|
|
|
| |
Initial support for solaris.
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@724 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
|
|
| |
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@596 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
|
|
|
|
|
| |
new files, not the modified ones.)
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@509 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
|
|
| |
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@492 b0b603af-a30f-0410-a34e-baf09ae79d0b
|
|
git-svn-id: http://code.sixapart.com/svn/memcached/trunk/server@468 b0b603af-a30f-0410-a34e-baf09ae79d0b
|