| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
mcp.active_req_limit(count)
mcp.buffer_memory_limit(kilobytes)
Divides by the number of worker threads and creates a per-worker-thread
limit for the number of concurrent proxy requests, and how many bytes
used specifically for value bytes. This does not represent total memory
usage but will be close.
Buffer memory for inbound set requests is not accounted for until after
the object has been read from the socket; to be improved in a future
update. This should be fine unless clients send just the SET request and
then hang without sending further data.
Limits should be live-adjustable via configuration reloads.
|
|
|
|
|
|
|
|
|
|
| |
Also changes the way the global context and thread contexts are fetched
from lua; via the VM extra space instead of upvalues, which is a little
faster and more universal.
It was always erroneous to run a lot of the config functions from routes
and vice versa, but there was no consistent strictness so users could
get into trouble.
|
|
|
|
|
|
|
| |
`watch deletions`: would log all keys which are deleted using either `delete` or `md` command.
The log line would contain the command used, the key, the clsid and size of the deleted item.
Items which result in delete miss or are marked as stale wouldn't show up in the logs
|
|
|
|
|
|
|
|
|
| |
Sending 's' flag to metaset now returns the size of the item stored.
Useful if you want to know how large an append/prepended item now is.
If the 'N' flag is supplied while in append/prepend mode, allows
autovivifying (with exptime supplied from N) for append/prepend style
keys that don't need headers created first.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "metadump" command was designed primarily for doing analysis on
what's in cache, but it's also used for pulling the data out for various
reasons.
The string format is a bit onerous: key=value (for futureproofing) and
URI encoded keys (which may or may not be binary internally)
This adds a command "mgdump", which dumps keys in the format:
"mg key\r\nmg key2\r\n"
if a key is binary encoded, it uses the meta binary encoding scheme of
base64-ing keys and appends a "b" flag:
"mg 44OG44K544OI b\r\n"
when the dump is complete it prints an "EN\r\n"
clients wishing to stream or fetch data can take the mg commands, strip
the \r\n, append any flags they care about, then send the command back
to the server to fetch the full key data.
This seems to use 30-40% less CPU time on the server for the same key
dumps.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`mcp.pool(p, { dist = etc, iothread = true }`
By default the IO thread is not used; instead a backend connection is
created for each worker thread. This can be overridden by setting
`iothread = true` when creating a pool.
`mcp.pool(p, { dist = etc, beprefix = "etc" }`
If a `beprefix` is added to pool arguments, it will create unique
backend connections for this pool. This allows you to create multiple
sockets per backend by making multiple pools with unique prefixes.
There are legitimate use cases for sharing backend connections across
different pools, which is why that is the default behavior.
|
|
|
|
|
|
| |
Allow freeing client response objects without the client object. Clean
some confusing logic around clearing memory. Also exposes an interface
for allocating unlinked response objects.
|
|
|
|
|
|
|
|
| |
- removes unused "completed" IO callback handler
- moves primary post-IO callback handlers from the queue definition to
the actual IO objects.
- allows IO object callbacks to be handled generically instead of based
on the queue they were submitted from.
|
|
|
|
|
|
|
|
|
| |
We want to start using cache commands in contexts without a client
connection, but the client object has always been passed to all
functions.
In most cases we only need the worker thread (LIBEVENT_THREAD *t), so
this change adjusts the arguments passed in.
|
|
|
|
|
|
|
| |
proxy_req_active shows the number of active proxy requests, but if those
proxy requests make sub-requests via mcp.await() they are not accounted
for. This gives the number of await's active, but not the total number
of in-flight async requests.
|
|
|
|
|
| |
allow users to differentiate thread functions externally to memcached.
Useful for setting priorities or pinning threads to CPU's.
|
|
|
|
|
| |
was a temporary hidden option for a few early adopters to use while
migrating from OK to HD status codes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extstore has a background thread which examines slab classes for items
to flush to disk. The thresholds for flushing to disk are managed by a
specialized "slab automove" algorithm. This algorithm was written in
2017 and not tuned since.
Most serious users set "ext_item_age=0" and force flush all items. This
is partially because the defaults do not flush aggressively enough,
which causes memory to run out and evictions to happen.
This change simplifies the slab automove portion. Instead of balancing
free chunks of memory per slab class, it sets a target of a certain
number of free global pages.
The extstore flusher thread also uses the page pool and some low chunk
limits to decide when to start flushing. Its sleep routines have also
been adjusted as it could oversleep too easily.
A few other small changes were required to avoid over-moving slab pages
around.
|
|
|
|
| |
Also linux.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-l proto[ascii]:127.0.0.1:11211
accepts:
- ascii
- binary
- negotiating
- proxy
Allows running proxy on default listeners but direct to memcached on a
specific port, or binary and ascii on different ports, or etc.
|
|
|
|
|
|
| |
-l tag[asdfasdf]:0.0.0.0:11211
not presently used for anything outside of the proxy code.
|
| |
|
|
|
|
|
|
| |
Lua level API for logging full context of a request/response. Provides
log_req() for simple logging and log_reqsample() for conditional
logging.
|
|
|
|
|
|
| |
allows tests to run faster, let users make it sleep longer/less time.
Also cuts the sleep time down when actively compacting and coming from
high idle.
|
|
|
|
|
|
|
| |
I wanted to do this via lua with some on-close hooks on the coroutine
but this might work for now. not 100% sure I caught all of the incr/decr
cases properly. Was trying to avoid hitting the counters too hard as
well.
|
|
|
|
| |
added to "stats proxy" output. counters of commands seen at inbound.
|
|
|
|
|
|
|
| |
- fixes potential memory leaks if an error is generated while creating a
pool object.
- misc comment updates and error handling.
- avoid crash if attempting to route commands that don't have a key.
|
|
|
|
|
|
|
| |
if a conn goes to sleep while reading set data from the network its
coroutine would be lost, and crash/corruption on next resume.
this now properly handles lifetime/cleanup of the coroutine.
|
|
|
|
|
| |
instead of automatically attempting to use io_uring if compiled in,
require a start option.
|
|
|
|
|
|
| |
Add two new stat keys, `store_too_large` and `store_no_memory`, to track
occurrences of storage request rejections due to writing too large of a
value and writing beyond available provisioned memory, respectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See BUILD for compilation details.
See t/startfile.lua for configuration examples.
(see also https://github.com/memcached/memcached-proxylibs for
extensions, config libraries, more examples)
NOTE: io_uring mode is _not stable_, will crash.
As of this commit it is not recommended to run the proxy in production.
If you are interested please let us know, as we are actively stabilizing
for production use.
|
|
|
|
|
| |
The stat key `log_watchers` indicates the number of active connected
`watch` clients.
|
|
|
|
|
|
|
| |
`-o ssl_min_version` can be used to configure the server to only accept
handshakes from clients with a minimum TLS protocol version. Currently
supported options are TLS v1.0, TLS v1.1, TLS v1.2, and TLS v1.3
(OpenSSL 1.1.1+ only).
|
|
|
|
|
|
|
|
|
|
|
| |
probably squash into previous commit.
io->c->thead can change for orpahned IO's, so we had to directly add the
original worker thread as a reference.
also tried again to split callbacks onto the thread and off of the
connection for similar reasons; sometimes we just need the callbacks,
sometimes we need both.
|
|
|
|
|
|
|
|
|
|
|
| |
instead of passing ownership of (io_queue_t)*q to the side thread,
instead the ownership of IO objects are passed to the side thread, which
are then individually returned. The worker thread runs return_cb() on
each, determining when it's done with the response batch.
this interface could use more explicit functions to make it more clear.
Ownership of *q isn't actually "passed" anywhere, it's just used or not
used depending on which return function the owner wants.
|
|
|
|
|
|
| |
now that all of the read/writes to the notify pipe are in one place,
we can easily use linux eventfd if available. This also allows batching
events so we're not firing the same notifier constantly.
|
|
|
|
|
|
|
| |
worker notification was a mix of reading data from pipe or examining a
an object queue stack. now it's all one interface. this is necessary to
switch signalling to eventfd or similar, since we won't have that pipe
to work with.
|
|
|
|
|
|
|
| |
help scalability a bit by having a per-worker-thread freelist and queue
for connection event items (new conns, etc). Also removes a hand-rolled
linked list and uses cache.c for freelist handling to cull some
redundancy.
|
|
|
|
|
|
|
|
| |
Add support for `watch connevents` to report opened (`conn_new`)
and closed (`conn_close`) client connections. Event log lines indicate
the connection's remote IP, remote port, and transport type.
`conn_close` events additionally supply a reason for the closing the
connection.
|
|
|
|
|
| |
Note: Do not fix typos in crc32.c because it's copied from an upstream
source
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I had the response code as "HD" in the past, but standardized on OK
while merging a number of "OK-like" rescodes together. This was a
mistake; as many "generic" memcached response codes use "OK". Most of
these are management or specialized uncommon commands.
With this, a client response parser can know for sure if a response is
to a meta command, or some other command.
`-o meta_response_old` starttime option has been added, valid for the
next 3 months, which switches the response code back from HD to OK. In
case any existing users depended on this and need time to migrate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
incorrectly.
UDP_MAX_PAYLOAD_SIZE actually contains the length of the private UDP header, but resp->tosend only contains the length of the data part.
The number of required UDP packets calculated by the original code will be less than the actual need.
E.g:
1000000/1400 = 714.2 ceil 715
1000000/1392 = 718.3 ceil 719
Actually 719 datagrams are needed, and 715 is wrong.
Signed-off-by: AK Deng <ttttabcd@protonmail.com>
|
|
|
|
|
| |
ie: ms [key] b
if 'k' flag is given and key is binary, returns as binary encoded.
|
| |
|
|
|
|
|
|
|
| |
conn_nread state handles c->item like an item, but allow it to be a
temporary malloced blob via setting c->item_malloced = true.
to be used for buffering value reads in the proxy code.
|
| |
|
|
|
|
| |
was defined in memcached.c, but also used in thread.c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By default memcached assigns connections to worker threads in
a round-robin manner. This patch introduces an option to select
a worker thread based on the incoming connection's NAPI ID if
SO_INCOMING_NAPI_ID socket option is supported by the OS.
This allows a memcached worker thread to be associated with a
NIC HW receive queue and service all the connection requests
received on a specific RX queue. This mapping between a memcached
thread and a HW NIC queue streamlines the flow of data from the
NIC to the application. In addition, an optimal path with reduced
context switches is possible, if epoll based busy polling
(sysctl -w net.core.busy_poll = <non-zero value>) is also enabled.
This feature is enabled via a new command line parameter -N <num>
or "--napi_ids=<num>", where <num> is the number of available/assigned
NIC hardware RX queues through which the connections can be received.
The number of napi_ids specified cannot be greater than the number
of worker threads specified using -t/--threads option.
If the option is not specified, or the conditions not met, the code
defaults to round robin thread selection.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
|
|
|
|
|
|
| |
since multiple queues can be sent to different sidethreads, we need a
new mechanism for knowing when to return everything. In the common case
only one queue will be active, so adding a mutex would be excessive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mc_resp is the proper owner of a pending IO once it's been initialized;
release it during resp_finish(). Also adds a completion callback which
runs on the submitted stack after returning to the worker thread but
before the response is transmitted.
allows re-queueing for pending IO if processing a response generates
another pending IO. also allows a further refactor to run more extstore
code on the worker thread instead of the IO threads.
uses proper conn_io_queue state to describe connections waiting for
pending IO's.
|
|
|
|
|
|
| |
reserve space in an io_pending_t. users cast it to a more specific
structure, avoiding extra allocations for local data. In this case what
might require 3 allocs stays as just 1.
|
|
|
|
|
| |
don't gate on EXTSTORE for the deferred io_queue code. removes a number of
ifdef's and allows more clean usage of the interface.
|
|
|
|
|
|
|
| |
extstore.h is now only used from storage.c. starting a path towards
getting the storage interface to be more generalized.
should be no functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
| |
want to reuse the deferred IO system for extstore for something else.
Should allow evolving into a more plugin-centric system.
step one of three(?) - replace in place and tests pass with extstore
enabled.
step two should move more extstore code into storage.c
step three should build the IO queue code without ifdef gating.
|