| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
allow users to differentiate thread functions externally to memcached.
Useful for setting priorities or pinning threads to CPU's.
|
|
|
|
|
|
|
|
|
|
|
| |
clang-15+ has started diagnosing them as errors
thread.c:925:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
| void STATS_UNLOCK() {
| ^
| void
Signed-off-by: Khem Raj <raj.khem@gmail.com>
|
|
|
|
|
|
|
| |
extstore.h is now only used from storage.c. starting a path towards
getting the storage interface to be more generalized.
should be no functional changes.
|
|
|
|
| |
the classid is expected to be set correctly on freed memory.
|
|
|
|
|
|
|
|
|
|
|
| |
t/64bit.t .. 1/6
Failed test 'expected (faked) value of total_malloced'
at t/64bit.t line 30.
got: '32'
expected: '4294967328'
Failed test 'hit size limit'
at t/64bit.t line 41.
In Windows, long is just 32-bit even for 64-bit platforms.
|
|
|
|
|
|
|
|
|
|
|
| |
If the mmap file is reused but the memory isn't supposed to be reused,
pages are thrown into the global page pool. Normally when pages are
released into the pool the header of the page is zero'ed so the
restart_check() code will know to place it back into the global pool.
When restarting multiple times the slabs_prefill() part of the startup
code was missing this zero'ing step, so the _next_ time restart happens
properly restart_check() could attempt to recover that memory.
|
|
|
|
|
|
| |
An ifdef was added for FreeBSD at some point since it gives you 0'd out
memory from malloc, but memory from the global pool isn't always from
malloc.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change removes the unused environment variable for running N
chunk move checks per slab lock and fixes the "completed" check.
My idea for the "completed" bits were originally to skip acquiring the
slabs lock for items we know to be complete. The check was accidentally
made redundant with the post-lock flags check, so there was no actual
speedup during busy loops.
This should now have much less performance impact for page moving.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PR #524 speeds up the slab page mover for some situations: if an item
is expired, but not yet reaped (by a fetch, or LRU crawler), the slab
page mover unlinks the item, marks it as free, and doesn't mark the
loop as "Busy". If any items were marked as 'busy' (needing to be
freed, re-checked, or actively being used) the page mover will re-scan
all entries from the top.
With this change the re-scan becomes less likely.
The bug is in the check for chunked items. `ch` is only set if an
ITEM_CHUNK is detected. If ITEM_CHUNKED is instead detected we also
need to handle it with the full free routine. This was not done. This
frees the header item (CHUNKED), and then leaks all of the chunks
associated with it.
Getting into this situation is hard:
- have an expired chunked item in memory that the LRU crawler hasn't
caught.
- have the page mover find and move the page specifically with the
header item, not an individual chunk (these are in the small slab
classes).
- later on pages with the leaked chunks get moved to other slab
classes. The slab mover thinks they're either free or that it needs to
unlink the orphaned chunks. This should then crash in the page mover
since the next/prev headers may no longer make sense. Or, worse, this
has to go on for a while since the may make sense.
This one liner fixes it.
|
|
|
|
| |
Using native capabilities on FreeBSD, super pages.
|
|
|
|
|
|
|
|
|
| |
- `slabs_rebalance_lock`
- `slab_rebalance_cond`
- `maintenance_lock`
- `lru_crawler_lock`
- `lru_crawler_cond`
- `lru_maintainer_lock`
|
|
|
|
|
|
|
|
| |
For full discussion see: https://github.com/memcached/memcached/pull/524
- Avoids looping in most cases where an item had to be force-freed
- Avoids re-locking and re-checking already completed memory
- Uses a backoff timer for sleeping when busy items are found
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"-e /path/to/tmpfsmnt/file"
SIGUSR1 for graceful stop
restart requires the same memory limit, slab sizes, and some other
infrequently changed details. Most other options and features can
change between restarts. Binary can be upgraded between restarts.
Restart does some fixup work on start for every item in cache. Can take
over a minute with more than a few hundred million items in cache.
Keep in mind when a cache is down it may be missing invalidations,
updates, and so on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mem_requested is an oddball counter: it's the total number of bytes
"actually requested" from the slab's caller. It's mainly used for a
stats counter, alerting the user that the slab factor may not be
efficient if the gap between total_chunks * chunk_size - mem_requested
is large.
However, since chunked items were added it's _also_ used to help the
LRU balance itself. The total number of bytes used in the class vs the
total number of bytes in a sub-LRU is used to judge whether to move
items between sub-LRU's.
This is a layer violation; forcing slabs.c to know more about how items
work, as well as EXTSTORE for calculating item sizes from headers.
Further, it turns out it wasn't necessary for item allocation: if we
need to evict an item we _always_ pull from COLD_LRU or force a move
from HOT_LRU. So the total doesn't matter.
The total does matter in the LRU maintainer background thread. However,
this thread caches mem_requested to avoid hitting the slab lock too
frequently. Since sizes_bytes[] within items.c is generally redundant
with mem_requested, we now total sizes_bytes[] from each sub-LRU before
starting a batch of LRU juggles.
This simplifies the code a bit, reduces the layer violations in slabs.c
slightly, and actually speeds up some hot paths as a number of branches
and operations are removed completely.
This also fixes an issue I was having with the restartable memory
branch :) recalculating p->requested and keeping a clean API is painful
and slow.
NOTE: This will vary a bit compared to what mem_requested originally
did, mostly for large chunked items.
For items which fit inside a single slab chunk, the stat is identical.
However, items constructed by chaining chunks will have a single large
"nbytes" value and end up in the highest slab class. Chunked items can
be capped with chunks from smaller slab classes; you will see
utilization of chunks but not an increase in mem_requested for them.
I'm still thinking this through but this is probably acceptable. Large
chunked items should be accounted for separately, perhaps with some new
counters so they can be discounted from normal calculations.
|
|
|
|
|
|
|
|
|
| |
did a weird dance. nsuffix is no longer an 8bit length, replaced with
ITEM_CFLAGS bit. This indicates whether there is a 32bit set of
client flags in the item or not.
possible after removing the inlined ascii response header via previous
commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
some whackarse ARM platforms on specific glibc/gcc (new?) versions trip
SIGBUS while reading the header chunk for a split item.
the header chunk is unfortunate magic: It lives in ITEM_data() at a random
offset, is zero sized, and only exists to simplify code around finding the
orignial slab class, and linking/relinking subchunks to an item.
there's no fix to this which isn't a lot of code. I need to refactor chunked
items, and attempted to do so, but couldn't come up with something I liked
quickly enough.
This change pads the first chunk if alignment is necessary, which wastes
bytes and a little CPU, but I'm not going to worry a ton for these obscure
platforms.
this works with rebalancing because in the case of ITEM_CHUNKED header, it
treats the item size as the size of the class it resides in, and memcpy's the
item during recovery.
all other cases were changes from ITEM_data to a new ITEM_schunk() inline
function that is created when NEED_ALIGN is set, else it's equal to ITEM_data
still.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Linux has supported transparent huge pages on Linux for quite some time.
Memory regions can be marked for conversion to huge pages with madvise.
Alternatively, Users can have the system default to using huge pages for
all memory regions when applicable, i.e. when the mapped region is large
enough, the properly aligned pages will be converted.
Using either method, we would preallocate memory for the cache with
proper alignment, and call madvise on it. Whether the memory region
actually gets converted to hugepages ultimately depends on the setting
of /sys/kernel/mm/transparent_hugepage/enabled. The existence of this
file is also checked to see if transparent huge pages support is compiled
into the kernel.
If any step of the preallocation fails, we simply fallback to standard
allocation, without even preallocating slabs, as they would not have
the proper alignment or settings anyway.
|
|
|
|
|
|
|
| |
allows reassigning memory from global page pool to a specific class.
this allows simplifying the algorithm to rely on moving memory to/from
global, removing hacks around relaxing free memory requirements.
|
|
|
|
|
|
|
|
|
|
|
|
| |
this fix may be replaced with a better restructuring; as this was done more
properly in the result handler code below. Not releasing the slab lock while
unlinking can cause a deadlock:
item A unlinks, which locks LRU, then tries to lock SLAB
page mover locks SLAB, locks item B, tries to unlink item B
if done after A locks LRU, it deadlocks while B tries to lock LRU.
doh :/
|
|
|
|
|
|
|
|
|
|
|
|
| |
the slab page mover algo would fill memory to evicting for a few seconds
before jumping to life and shoveling some pages into the global pool.
swore I was going to fix this post-release, but I had a moment of inspiration
after finding some code from another branch that did half the work. After a
bunch of stupid bugs it seems to work.
-o slab_automove_freeratio=0.N is now an option. This is *percentage of total
memory*, so don't set it too high.
|
|
|
|
|
|
|
|
| |
couple TODO items left for a new issue I thought of. Also hardcoded memory
buffer size which should be fixed.
also need to change the "free and re-init" logic to use a boolean in case any
related option changes.
|
|
|
|
|
|
| |
uses actual mem limit... may be wrong if slab_reassign isn't enabled :(
allows better judgement for flusher.
|
|
|
|
|
|
|
| |
if we want to disable automove and run an external algo that examines the
page pool count...
removes temp hack from external script.
|
|
|
|
|
| |
page mover didn't have a reference to the storage object, so items lost
during page move transitions wouldn't be decremented from storage.
|
|
|
|
|
| |
been squashing reorganizing, and pulling code off to go upstream ahead
of merging the whole branch.
|
|
|
|
|
|
|
|
| |
waits at least a full millisecond before scanning the page again. should
hammer a lot less in the background when stuck.
could probably up more, but want to keep it relatively aggressive in case of
hot memory that it might have to free a few times.
|
|
|
|
| |
converts the python script to C, more or less.
|
|
|
|
|
|
|
|
|
| |
if we loop through a slab too many times without freeing everything, delete
items stuck with high refcounts. they should bleed off so long as the
connections aren't jammed holding them.
should be possible to force rescues in this case as well, but that's more code
so will follow up later. Need a big-ish refactor.
|
|
|
|
|
|
|
| |
plus a locking fix for slabs reassign.
lru maintainer can call into slabs reassign + lru crawler, so we need to pause
it before attempting to pause the other two threads.
|
|
|
|
| |
also fixes memory tracking bug when releasing memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* accesses
* amount
* append
* command
* cyrillic
* daemonize
* detaches
* detail
* documentation
* dynamically
* enabled
* existence
* extra
* implementations
* incoming
* increment
* initialize
* issue
* javascript
* number
* optimization
* overall
* pipeline
* reassign
* reclaimed
* response
* responses
* sigabrt
* specific
* specificity
* tidiness
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory chunk chains would simply stitch multiple chunks of the highest slab
class together. If your item was 17k and the chunk limit is 16k, the item
would use 32k of space instead of a bit over 17k.
This refactor simplifies the slab allocation path and pulls the allocation of
chunks into the upload process. A "large" item gets a small chunk assigned as
an object header, rather than attempting to inline a slab chunk into a parent
chunk. It then gets chunks individually allocated and added into the chain
while the object uploads.
This solves a lot of issues:
1) When assembling new, potentially very large items, we don't have to sit and
spin evicting objects all at once. If there are 20 16k chunks in the tail and
we allocate a 1 meg item, the new item will evict one of those chunks
inbetween each read, rather than trying to guess how many loops to run before
giving up. Very large objects take time to read from the socket anyway.
2) Simplifies code around the initial chunk. Originally embedding data into
the top chunk and embedding data at the same time required a good amount of
fiddling. (Though this might flip back to embedding the initial chunk if I can
clean it up a bit more).
3) Pulling chunks individually means the slabber code can be flatened to not
think about chunks aside from freeing them, which culled a lot of code and
removed branches from a hot path.
4) The size of the final chunk is naturally set to the remaining about of
bytes that need to be stored, which means chunks from another slab class can
be pulled to "cap off" a large item, reducing memory overhead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when I first split the locks up further I had a trick where "item_remove()"
did not require holding the associated item lock. If an item were to be freed,
it would then do the necessary work.;
Since then, all calls to refcount_incr and refcount_decr only happen while the
item is locked. This was mostly due to the slab mover being very tricky with
locks. The atomic is no longer needed as the refcount is only ever checked
after a lock to the item.
Calling atomics is pretty expensive, especially in multicore/multisocket
scenarios. This yields a notable performance increase.
|
|
|
|
|
| |
-I 2m would still allocate 2mb pages, then only use 1mb of it, halving memory
capacity.
|
|
|
|
|
|
|
|
|
|
| |
also fixes the new LRU algorithm to balance by total bytes used rather than
total chunks used, since total chunks used isn't tracked for multi-chunk
items.
also fixes a bug where the lru limit wasn't being utilized for HOT_LRU
also some cleanup from previous commits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
has spent some time under performance testing. For larger items there's less
than 5% extra CPU usage, however the max usable CPU when using large items is
1/10th or less before you run out of bandwidth. Mixed small/large items will
still balance out.
comments out debugging (which must be removed for release).
restores defaults and ensures only t/chunked-items.t is affected.
dyn-maxbytes and item_size_max tests still fail.
append/prepend aren't implemented, sasl needs to be guarded.
slab mover needs to be updated.
|
|
|
|
|
|
|
|
|
| |
can actually fetch items now, and fixed a few bugs with storage/freeing.
added fetching for binprot.
added some basic tests.
many tests still fail for various reasons, and append/prepend isn't fixed yet.
|
|
|
|
|
| |
can set and store large items via asciiprot. gets/append/prepend/binprot not
implemented yet.
|
|
|
|
|
|
|
|
|
|
| |
tons of stats were left in the global stats structure that're no longer used,
and it looks like we kept accidentally adding new ones in there. There's also
an unused mutex.
Split global stats into `stats` and `stats_state`. initialize via memset,
reset only `stats` via memset, removing several places where stats values get
repeated. Looks much cleaner and should be less error prone.
|
|
|
|
|
|
|
|
|
|
| |
Allows dynamically increasing the memory limit of a running system, if memory
isn't being preallocated.
If `-o modern` is in use, can also dynamically lower memory usage. pages are
free()'ed back to the OS via the slab rebalancer as memory is freed up. Does
not guarantee the OS will actually give the memory back for other applications
to use, that depends on how the OS handles memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"-o slab_sizes=100-200-300-400-500" will create 5 slab classes of those
specified sizes, with the final class being item_max_size.
Using the new online stats sizes command, it's possible to determine if the
typical factoral slab class growth rate doesn't align well with how items are
stored.
This is dangerous unless you really know what you're doing. If your items have
an exact or very predictable size this makes a lot of sense. If they do not,
the defaults are safer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"stats sizes" is one of the lack cache-hanging commands. With millions of
items it can hang for many seconds.
This commit changes the command to be dynamic. A histogram is tracked as items
are linked and unlinked from the cache. The tracking is enabled or disabled at
runtime via "stats sizes_enable" and "stats sizes_disable".
This presently "works" but isn't accurate. Giving it some time to think over
before switching to requiring that CAS be enabled. Otherwise the values could
underflow if items are removed that existed before the sizes tracker is
enabled. This attempts to work around it by using it->time, which gets updated
on fetch, and is thus inaccurate.
|
|
|
|
| |
total_items is pretty easy to overflow. Upped some of the others just in case.
|
|
|
|
|
|
|
|
|
|
| |
musl libc will warn if you include sys/signal.h instead of signal.h as
specified by posix. Build will fail due to -Werror explicitly beeing
set.
Fix it by use the posix location.
fixes #138
|
|
|
|
|
|
|
|
|
|
|
| |
mem_alloced was getting increased every time a page was assigned out of either
malloc or the global page pool. This means total_malloced will inflate forever
as pages are reused, and once limit_maxbytes is surpassed it will stop
attempting to malloc more memory.
The result is we would stop malloc'ing new memory too early if page reclaim
happens before the whole thing fills. The test already caused this condition,
so adding the extra checks was trivial.
|
|
|
|
|
|
| |
previously the slab mover would evict items if the new chunk was within the
slab page being moved. now it will do an inline reclaim of the chunk and try
until it runs out of memory.
|
|
|
|
|
|
| |
gross oversight putting two conditions into the same variable. now can tell if
we're evicting because we're hitting the bottom of the free memory pool, or if
we keep trying to rescue items into the same page as the one being cleared.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
class 255 is now a legitimate class, used by the NOEXP LRU when the
expirezero_does_not_evict flag is enabled. Instead, we now force a single bit
ITEM_SLABBED when a chunk is returned to the slabber, and
ITEM_SLABBED|ITEM_FETCHED means it's been cleared for a page move.
item_alloc overwrites the chunk's flags on set. The only weirdness was
slab_free |='ing in the ITEM_SLABBED bit. I tracked that down to a commit in
2003 titled "more debugging" and can't come up with a good enough excuse for
preserving an item's flags when it's been returned to the free memory pool. So
now we overload the flag meaning.
|
|
|
|
|
| |
uses the slab_rebal struct to summarize stats, more occasionally grabbing the
global lock to fill them in, instead.
|
|
|
|
|
|
| |
During an item rescue, item size was being added to the slab class when the
new chunk requested, and then not removed again from the total if the item was
successfully rescued. Now just always remove from the total.
|