| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
on 32bit hardware with different pointer/slab class sizes, the tests would
fail. made a few adjustments to ensure reassign rescues happen and make items
not be near the default slab class borders.
This makes the tests pass, but needs further improvements for reliability..
ie: "fill until evicts", count slab pages for reassignment/etc.
|
|
|
|
|
|
|
|
|
|
|
| |
single CPU VM builders could fail:
- spawn LRU crawler thread.
- signal LRU crawler.
- LRU crawler thread now waits on condition.
- Crawler thread misses condition, sits forever.
Might also want to move the "stats.lru_crawler_running" bit to be updated when
the crawler thread picks up the work to do, somehow.
|
|
|
|
|
|
|
|
|
|
|
| |
mem_alloced was getting increased every time a page was assigned out of either
malloc or the global page pool. This means total_malloced will inflate forever
as pages are reused, and once limit_maxbytes is surpassed it will stop
attempting to malloc more memory.
The result is we would stop malloc'ing new memory too early if page reclaim
happens before the whole thing fills. The test already caused this condition,
so adding the extra checks was trivial.
|
|
|
|
|
|
| |
previously the slab mover would evict items if the new chunk was within the
slab page being moved. now it will do an inline reclaim of the chunk and try
until it runs out of memory.
|
|
|
|
|
|
| |
gross oversight putting two conditions into the same variable. now can tell if
we're evicting because we're hitting the bottom of the free memory pool, or if
we keep trying to rescue items into the same page as the one being cleared.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
class 255 is now a legitimate class, used by the NOEXP LRU when the
expirezero_does_not_evict flag is enabled. Instead, we now force a single bit
ITEM_SLABBED when a chunk is returned to the slabber, and
ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move.
item_alloc overwrites the chunk's flags on set. The only weirdness was
slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in
2003 titled "more debugging" and can't come up with a good enough excuse for
preserving an item's flags when it's been returned to the free memory pool. So
now we overload the flag meaning.
|
|
|
|
|
| |
if we're deciding to move pages right on the chunk boundary it's too easy to
cause flapping.
|
|
|
|
|
| |
uses the slab_rebal struct to summarize stats, more occasionally grabbing the
global lock to fill them in, instead.
|
|
|
|
|
|
| |
During an item rescue, item size was being added to the slab class when the
new chunk requested, and then not removed again from the total if the item was
successfully rescued. Now just always remove from the total.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If item does not have ITEM_SLABBED bit, or ITEM_LINKED bit, logic was falling
through, defaulting to MOVE_PASS. If an item has had storage allocated via
item_alloc(), but haven't completed the data upload, it will sit in this mode.
With MOVE_PASS for an item in this state, if no other items trip the busy
re-scan of the page the mover will consider the page completely wiped even
with the outstanding item.
The hilarious bit is I'd clearly thought this through: the top comment states
the if this, then this, or that... with the "or that" logic completely
missing. Add one line of code and it survived a 5 hour torture test, where
before it crashed after 30-60 minutes.
Leaves some handy debug code #ifdef'ed out. Also moves the memset wipe on page
move completion to only happen if the page isn't being returned to the global
page pool, as the page allocator does a memset and chunk-split.
Thanks to Scott Mansfield for the initial information eventually leading to
this discovery.
|
|
|
|
| |
Thanks Devon :)
|
|
|
|
|
| |
some new variables and change to the '1' mode. little sad nobody noticed I'd
accidentally removed the '2' mode for a few versions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If any slab classes have more than two pages worth of free chunks, attempt to
free one page back to a global pool.
Create new concept of a slab page move destination of "0", which is a global
page pool. Pages can be re-assigned out of that pool during allocation.
Combined with item rescuing from the previous patch, we can safely shuffle
pages back to the reassignment pool as chunks free up naturally. This should
be a safe default going forward. Users should be able to decide to free or
move pages based on eviction pressure as well. This is coming up in another
commit.
This also fixes a calculation of the NOEXP LRU size, and completely removes
the old slab automover thread. Slab automove decisions will now be part of the
lru maintainer thread.
|
|
|
|
|
|
|
| |
used to take the newest page of the page list and replace the oldest page with
it. so only the first page we move from a slab class will actually be "old".
instead, actually burn the slight CPU to shuffle all of the pointers down one.
Now we always chew the oldest page.
|
|
|
|
|
|
|
|
|
|
|
| |
During a slab page move items are typically ejected regardless of their
validity. Now, if an item is valid and free chunks are available in the same
slab class, copy the item over and replace it.
It's up to external systems to try to ensure free chunks are available before
moving a slab page. If there is no memory it will simply evict them as normal.
Also adds counters so we can finally tell how often these cases happen.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test is a port of a golang test submitted by Scott Mansfield.
There used to be an "angry birds mode" to slabs_automove, which attempts to
force a slab move from "any" slab into the one which just had an eviction.
This is an imperfect but fast way of responding to shifts in memory
requirements.
This change adds it back in plus a test which very quickly attempts to set
data in via noreply. This isn't the end of improvements here. This commit is a
starting point.
|
|
|
|
|
| |
add a handful of missing stats docs (doh) and shorted the STATS_LOCK a little
bit. the rest seems fine.
|
| |
|
|
|
|
|
| |
Update documentation to more accurately reflect the value
that should be passed along as the parameter via command line.
|
|
|
|
|
|
| |
'buf' could be freed after calling atoi, providing a gentler way to exit
Signed-off-by: Yongyue Sun <abioy.sun@gmail.com>
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
URL: https://code.google.com/p/memcached/issues/detail?id=310
|
| |
|
|
|
|
|
|
|
|
| |
Before 6ba9aa2771adcf785fe3fde03cd71832db15b086, multiple -l arguments
were ignored, and the last option passed was the one actually used.
This changes the behaviour to silently ignore duplicate -l flag options
passed and not crash and burn on the bind() call.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This is (these are) automatically started and stopped by the main job.
|
| |
|
|
|
|
| |
This is necessary for an Upstart job, because the call out to rm -f causes Perl to fork off a process, and Upstart think that new process is the main one (not the exec'd memcached process later on).
|
| |
|
|
|
|
|
|
| |
... there should always be an 'automake' alias, and we haven't had an
"unsupported" version in probably ten years. hopefully this stops systems with
upgraded automakes from breaking every time.
|
|
|
|
|
| |
histogram used buckets 1-60 instead of 0-59, so bucket 60 could cause a
segfault.
|
|
|
|
| |
slab class 255 is now valid, and it's a uint8_t
|
|
|
|
|
| |
we now use up to exactly clsid 255, which is the max size of a byte so the
assertion can't fail.
|
|
|
|
|
|
|
|
|
|
| |
This test is requiring that the juggler thread runs at all before the stats
check happens. I've tried running this under an rPi1 and can't reproduce the
race, but for some reason solaris amd64 does. This is likely due to the usleep
not working as expected.
Unfortunately I don't have direct access to a solaris host, so this is the
best I can do for now. The juggler does eventually wake up so I'm unconcerned.
|
|
|
|
|
| |
none of my machines could repro a crash, but it's definitely wrong :/ Very
sad.
|
|
|
|
|
| |
every time slab_automove would run it would segfault immediately, since the
call out into items.c would overwrite its stack.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
data sticking into the highest slab class was unallocated. Thanks to pyry for
the repro case:
perl -e 'use Cache::Memcached;$memd = new Cache::Memcached {
servers=>["127.0.0.1:11212"]};for(20..1000){print "$_\n";$memd->set("fo2$_",
"a"x1024)};'
(in a loop)
with:
./memcached -v -m 32 -p 11212 -f 1.012
This serves as a note to turn this into a test.
|
|
|
|
|
|
|
| |
Wasn't sending the condition signal after a refactor :(
Also adds some stats to inspect how much work the LRU crawler is doing, and
removes some printf noise for the LRU maintainer.
|
|
|
|
|
|
| |
could/should be an atomic. Previously all write mutations were wrapped with
cache_lock, but that's not the case anymore. Just enforce consistency around
the hash_items counter, which is used for hash table expansion.
|
|
|
|
|
|
|
| |
the fork is racey and the lru crawler or maintainer threads end up not
starting with daemonization. So we start them post-fork now.
Thanks pyry for the report!
|
|
|
|
|
|
|
|
|
| |
If a thread is allowed to go to sleep, it can be woken up early as soon as the
lock is freed. If we spinlock, the scheduler can't help us and threads will
randomly run out their timeslice until the thread actually holding the lock
finishes its work.
In my benchmarks killing the spinlock only makes things better.
|
|
|
|
|
|
|
|
| |
Separate the start function from what was string parsing and allow passing in
the 'remaining' value as an argument.
Also adds a (non-configurable yet) settings for how many crawls to run per
sleep, to raise the default aggressiveness of the crawler.
|