summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* make slab reassign tests more reliabledormando2015-11-191-13/+20
| | | | | | | | | on 32bit hardware with different pointer/slab class sizes, the tests would fail. made a few adjustments to ensure reassign rescues happen and make items not be near the default slab class borders. This makes the tests pass, but needs further improvements for reliability.. ie: "fill until evicts", count slab pages for reassignment/etc.
* 'lru_crawler enable' blocks until readydormando2015-11-191-1/+15
| | | | | | | | | | | single CPU VM builders could fail: - spawn LRU crawler thread. - signal LRU crawler. - LRU crawler thread now waits on condition. - Crawler thread misses condition, sits forever. Might also want to move the "stats.lru_crawler_running" bit to be updated when the crawler thread picks up the work to do, somehow.
* fix over-inflation of total_malloceddormando2015-11-182-2/+12
| | | | | | | | | | | mem_alloced was getting increased every time a page was assigned out of either malloc or the global page pool. This means total_malloced will inflate forever as pages are reused, and once limit_maxbytes is surpassed it will stop attempting to malloc more memory. The result is we would stop malloc'ing new memory too early if page reclaim happens before the whole thing fills. The test already caused this condition, so adding the extra checks was trivial.
* try harder to save itemsdormando2015-11-186-36/+57
| | | | | | previously the slab mover would evict items if the new chunk was within the slab page being moved. now it will do an inline reclaim of the chunk and try until it runs out of memory.
* split rebal_evictions into _nomem and _samepagedormando2015-11-185-12/+24
| | | | | | gross oversight putting two conditions into the same variable. now can tell if we're evicting because we're hitting the bottom of the free memory pool, or if we keep trying to rescue items into the same page as the one being cleared.
* stop using slab class 255 for page moverdormando2015-11-181-6/+8
| | | | | | | | | | | | | class 255 is now a legitimate class, used by the NOEXP LRU when the expirezero_does_not_evict flag is enabled. Instead, we now force a single bit ITEM_SLABBED when a chunk is returned to the slabber, and ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move. item_alloc overwrites the chunk's flags on set. The only weirdness was slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in 2003 titled "more debugging" and can't come up with a good enough excuse for preserving an item's flags when it's been returned to the free memory pool. So now we overload the flag meaning.
* tune automove to required 2.5 pages of free chunksdormando2015-11-181-1/+1
| | | | | if we're deciding to move pages right on the chunk boundary it's too easy to cause flapping.
* call STATS_LOCK() less in slab mover.dormando2015-11-182-13/+17
| | | | | uses the slab_rebal struct to summarize stats, more occasionally grabbing the global lock to fill them in, instead.
* "mem_requested" from "stats slabs" is now accuratedormando2015-11-181-3/+4
| | | | | | During an item rescue, item size was being added to the slab class when the new chunk requested, and then not removed again from the total if the item was successfully rescued. Now just always remove from the total.
* fix memory corruption in slab page moverdormando2015-11-181-2/+23
| | | | | | | | | | | | | | | | | | | | | If item does not have ITEM_SLABBED bit, or ITEM_LINKED bit, logic was falling through, defaulting to MOVE_PASS. If an item has had storage allocated via item_alloc(), but haven't completed the data upload, it will sit in this mode. With MOVE_PASS for an item in this state, if no other items trip the busy re-scan of the page the mover will consider the page completely wiped even with the outstanding item. The hilarious bit is I'd clearly thought this through: the top comment states the if this, then this, or that... with the "or that" logic completely missing. Add one line of code and it survived a 5 hour torture test, where before it crashed after 30-60 minutes. Leaves some handy debug code #ifdef'ed out. Also moves the memset wipe on page move completion to only happen if the page isn't being returned to the global page pool, as the page allocator does a memset and chunk-split. Thanks to Scott Mansfield for the initial information eventually leading to this discovery.
* fix off by one in slab shufflingdormando2015-11-181-1/+1
| | | | Thanks Devon :)
* documentation for slab rebal updatesdormando2015-11-181-1/+11
| | | | | some new variables and change to the '1' mode. little sad nobody noticed I'd accidentally removed the '2' mode for a few versions.
* first half of new slab automoverdormando2015-11-186-125/+73
| | | | | | | | | | | | | | | | | | If any slab classes have more than two pages worth of free chunks, attempt to free one page back to a global pool. Create new concept of a slab page move destination of "0", which is a global page pool. Pages can be re-assigned out of that pool during allocation. Combined with item rescuing from the previous patch, we can safely shuffle pages back to the reassignment pool as chunks free up naturally. This should be a safe default going forward. Users should be able to decide to free or move pages based on eviction pressure as well. This is coming up in another commit. This also fixes a calculation of the NOEXP LRU size, and completely removes the old slab automover thread. Slab automove decisions will now be part of the lru maintainer thread.
* properly shuffle page list after slab movedormando2015-11-181-8/+12
| | | | | | | used to take the newest page of the page list and replace the oldest page with it. so only the first page we move from a slab class will actually be "old". instead, actually burn the slight CPU to shuffle all of the pointers down one. Now we always chew the oldest page.
* slab mover rescues valid items with free chunksdormando2015-11-186-9/+80
| | | | | | | | | | | During a slab page move items are typically ejected regardless of their validity. Now, if an item is valid and free chunks are available in the same slab class, copy the item over and replace it. It's up to external systems to try to ensure free chunks are available before moving a slab page. If there is no memory it will simply evict them as normal. Also adds counters so we can finally tell how often these cases happen.
* restore slab_automove=2 support and add testdormando2015-11-182-0/+64
| | | | | | | | | | | | | Test is a port of a golang test submitted by Scott Mansfield. There used to be an "angry birds mode" to slabs_automove, which attempts to force a slab move from "any" slab into the one which just had an eviction. This is an imperfect but fast way of responding to shifts in memory requirements. This change adds it back in plus a test which very quickly attempts to set data in via noreply. This isn't the end of improvements here. This commit is a starting point.
* minor fixes and docs for listen_disabled timerdormando2015-11-184-3/+9
| | | | | add a handful of missing stats docs (doh) and shorted the STATS_LOCK a little bit. the rest seems fine.
* Record and report on time spent in listen_disabledIan Miell2015-11-184-4/+16
|
* Update manpage for -I command.Mattias Geniar2015-11-181-2/+3
| | | | | Update documentation to more accurately reflect the value that should be passed along as the parameter via command line.
* release memory before exitingYongyue Sun2015-11-181-1/+1
| | | | | | 'buf' could be freed after calling atoi, providing a gentler way to exit Signed-off-by: Yongyue Sun <abioy.sun@gmail.com>
* fix the misuse of settings.hot_lru_pct in process_stat_settingswangkang-xy2015-11-181-1/+1
|
* fix typo in 'touch' protocol documentationkenvifire2015-11-181-1/+1
|
* [doc]data return by "stats sizes" missing a STATAlwayswithme2015-11-181-1/+1
|
* copy original arg to avoid changing it with getsubopt()Antony Dovgal2015-11-181-2/+3
|
* check for NULL values and don't crashAntony Dovgal2015-11-181-0/+8
|
* Setting the pthread id of the LIBEVENT_THREAD on thread creation.Saman Barghi2015-11-181-2/+1
|
* Fix for 310: memcached unable to bind to an ipv6 addressgithublvv2015-11-181-5/+30
| | | | URL: https://code.google.com/p/memcached/issues/detail?id=310
* Update manpage for -lRoman Mueller2015-11-181-4/+7
|
* Do not bind to same interface more than once.Sharif Nassar2015-11-181-0/+3
| | | | | | | | Before 6ba9aa2771adcf785fe3fde03cd71832db15b086, multiple -l arguments were ignored, and the last option passed was the one actually used. This changes the behaviour to silently ignore duplicate -l flag options passed and not crash and burn on the bind() call.
* add man page for memcached-toolMiroslav Lichvar2015-11-181-0/+71
|
* fix usage text for -b optionMiroslav Lichvar2015-11-181-1/+1
|
* describe -b and -S options in man pageMiroslav Lichvar2015-11-181-0/+7
|
* fixed libevent version check: add the missing 1.0.x version checkmdl2015-11-181-1/+1
|
* Add instance job for memcachedCameron Norman2015-11-181-0/+26
| | | | This is (these are) automatically started and stopped by the main job.
* Added main memcached job.Cameron Norman2015-11-181-0/+25
|
* Use unlink instead of `rm -f` in start-memcached scriptCameron Norman2015-11-181-1/+1
| | | | This is necessary for an Upstart job, because the call out to rm -f causes Perl to fork off a process, and Upstart think that new process is the main one (not the exec'd memcached process later on).
* remove duplicated "#include "zhoutai2015-11-181-1/+0
|
* fix configure.ac warning and use system automakedormando2015-11-182-2/+2
| | | | | | ... there should always be an 'automake' alias, and we haven't had an "unsupported" version in probably ten years. hopefully this stops systems with upgraded automakes from breaking every time.
* fix off-by-one in LRU crawlerdormando2015-10-261-1/+1
| | | | | histogram used buckets 1-60 instead of 0-59, so bucket 60 could cause a segfault.
* remove another invalid assert()dormando2015-07-041-1/+0
| | | | slab class 255 is now valid, and it's a uint8_t
* stop clang from whining about asserts1.4.24dormando2015-04-251-4/+0
| | | | | we now use up to exactly clsid 255, which is the max size of a byte so the assertion can't fail.
* relax timing glitch in the lru maintainer testdormando2015-04-251-1/+8
| | | | | | | | | | This test is requiring that the juggler thread runs at all before the stats check happens. I've tried running this under an rPi1 and can't reproduce the race, but for some reason solaris amd64 does. This is likely due to the usleep not working as expected. Unfortunately I don't have direct access to a solaris host, so this is the best I can do for now. The juggler does eventually wake up so I'm unconcerned.
* fix major off by one issuedormando2015-04-241-1/+1
| | | | | none of my machines could repro a crash, but it's definitely wrong :/ Very sad.
* don't overwrite stack during slab_automove1.4.23dormando2015-04-191-1/+1
| | | | | every time slab_automove would run it would segfault immediately, since the call out into items.c would overwrite its stack.
* fix off-by-one with slab managementdormando2015-04-194-6/+5
| | | | | | | | | | | | | | data sticking into the highest slab class was unallocated. Thanks to pyry for the repro case: perl -e 'use Cache::Memcached;$memd = new Cache::Memcached { servers=>["127.0.0.1:11212"]};for(20..1000){print "$_\n";$memd->set("fo2$_", "a"x1024)};' (in a loop) with: ./memcached -v -m 32 -p 11212 -f 1.012 This serves as a note to turn this into a test.
* Make LRU crawler work from maint thread.dormando2015-02-136-10/+22
| | | | | | | Wasn't sending the condition signal after a refactor :( Also adds some stats to inspect how much work the LRU crawler is doing, and removes some printf noise for the LRU maintainer.
* basic lock around hash_items counterdormando2015-02-061-0/+5
| | | | | | could/should be an atomic. Previously all write mutations were wrapped with cache_lock, but that's not the case anymore. Just enforce consistency around the hash_items counter, which is used for hash table expansion.
* fix crawler/maintainer threads starting with -ddormando2015-02-061-8/+14
| | | | | | | the fork is racey and the lru crawler or maintainer threads end up not starting with daemonization. So we start them post-fork now. Thanks pyry for the report!
* spinlocks never seem to help in benchmarksdormando2015-01-091-6/+1
| | | | | | | | | If a thread is allowed to go to sleep, it can be woken up early as soon as the lock is freed. If we spinlock, the scheduler can't help us and threads will randomly run out their timeslice until the thread actually holding the lock finishes its work. In my benchmarks killing the spinlock only makes things better.
* small crawler refactordormando2015-01-093-32/+67
| | | | | | | | Separate the start function from what was string parsing and allow passing in the 'remaining' value as an argument. Also adds a (non-configurable yet) settings for how many crawls to run per sleep, to raise the default aggressiveness of the crawler.