| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
If you had a numerical gap you would print junk.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bug introduced in 6c80728: use after free for response buffer while
under concurrency.
The await code has a different method of wrapping up a lua coroutine
than a standard response, so it was not managing the lifecycle of the
response object properly, causing data buffers to be reused before being
written back to the client.
This fix separates the accounting of memory from the freeing of the
buffer, so there is no more race.
Further restructuring is needed to both make this less bug prone and
make memory accounting be lock step with the memory freeing.
|
|
|
|
|
| |
A few code paths were returning SERVER_ERROR (a retryable error)
when it should have been CLIENT_ERROR (bad protocol syntax).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
with the event handler rewrite the IO thread scales much better (up to
8-12 worker threads), leaving the io_uring code in the dust.
realistically io_uring won't be able to beat the event code if you're
using kernels older than 6.2, which is brand new. Instead of carrying
all this code around and having people randomly try it to get more
performance, I want to rip it out of the way and add it back in later
when it makes sense.
I am using mcshredder as a platform to learn and keep up to date with
io_uring, and will port over its usage pattern when it's time.
|
|
|
|
|
|
|
|
|
|
| |
Cleans up logic around response handling in general. Allows returning
server-sent error messages upstream for handling.
In general SERVER_ERROR means we can keep the connection to the backend.
The rest of the errors are protocol errors, and while some are perfectly
safe to whitelist, clients should not be causing those sorts of errors
and we should cycle the backend regardless.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a client sends multiple requests in the same packet, the proxy would
reverse the requests before sending them to the backend. They would
return to client in the correct order because top level responses are
sent in the order they were created.
In practice I guess this is rarely noticed. If a client sends a series
of commands where the first one generates a syntax error, all prior
commands would still succeed.
It would also trip people up if they test pipelining commands as
read-your-write would fail as the write gets ordered after the read.
Did run into this before, but I thought it was just the ascii multiget
code reversing keys, which would be harmless as the whole command has to
complete regardless of key order.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds:
mcp.active_req_limit(count)
mcp.buffer_memory_limit(kilobytes)
Divides by the number of worker threads and creates a per-worker-thread
limit for the number of concurrent proxy requests, and how many bytes
used specifically for value bytes. This does not represent total memory
usage but will be close.
Buffer memory for inbound set requests is not accounted for until after
the object has been read from the socket; to be improved in a future
update. This should be fine unless clients send just the SET request and
then hang without sending further data.
Limits should be live-adjustable via configuration reloads.
|
|
|
|
|
|
|
|
|
|
| |
Also changes the way the global context and thread contexts are fetched
from lua; via the VM extra space instead of upvalues, which is a little
faster and more universal.
It was always erroneous to run a lot of the config functions from routes
and vice versa, but there was no consistent strictness so users could
get into trouble.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The client connection state machine loops through a few states when
handling pipelined requests.
To start:
conn_waiting -> conn_read -> conn_parse_cmd (execution)
After conn_parse_cmd, we can enter:
conn_nread (read a mutation payload from the network) -> conn_new_cmd
or directly: conn_new_cmd
conn_new_cmd checks the limit specified in -R, flushing the pipeline if
we exceed that limit. Else it wraps back to conn_parse_cmd
The proxy code was _not_ resetting state to conn_new_cmd after any
non-mutation command. If a value was set it would properly run through
nread -> conn_new_cmd
This means that clients issuing requests against a proxy server have
unlimited pipelines, and the proxy will buffer the entire result set
before beginning to return data to the client. Especially if requests
are for very large items, this can cause a very high Time To First Byte
in the response to the client.
|
|
|
|
|
|
|
|
|
|
|
| |
- Refcount leak on sets
- Move the response elapsed timer back closer to when the response was
processed as to not clobber the wrong IO object data
- Restores error messages from set/ms
- Adds start of unit tests
Requests will look like they run a tiiiiny bit faster than they do, but
I need to get the elapsed time there for a later change.
|
|
|
|
|
|
|
|
| |
local res = mcp.internal(r) - takes a request object and executes it
against the proxy's internal cache instance.
Experimental as of this commit. Needs more test coverage and
benchmarking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`mcp.pool(p, { dist = etc, iothread = true }`
By default the IO thread is not used; instead a backend connection is
created for each worker thread. This can be overridden by setting
`iothread = true` when creating a pool.
`mcp.pool(p, { dist = etc, beprefix = "etc" }`
If a `beprefix` is added to pool arguments, it will create unique
backend connections for this pool. This allows you to create multiple
sockets per backend by making multiple pools with unique prefixes.
There are legitimate use cases for sharing backend connections across
different pools, which is why that is the default behavior.
|
|
|
|
|
|
|
| |
Response object error conditions were not being checked before looking
at the response buffer. If a response was partially filled then the
backend timed out, a partial response could be sent intead of the
proper backend error.
|
|
|
|
|
|
|
|
|
|
| |
ie:
local b1 = mcp.backend({ label = "b1", host = "127.0.0.1", port = 11511,
connecttimeout = 1, retrytimeout = 0.5, readtimeout = 0.1,
failurelimit = 11 })
... to allow for overriding connect/retry/etc tunables on a per-backend
basis. If not passed in the global settings are used.
|
|
|
|
|
|
|
|
|
|
|
| |
- specifically the WSTAT_DECR in proxy_await.c's return code could
potentially use the wrong thread's lock
This is why I've been swapping c with thread as lock/function arguments
all over the code lately; it's very accident prone.
Am reasonably sure this causes the deadlock but need to attempt to
verify more.
|
|
|
|
|
|
| |
We were duck typing the response code for a coroutine yield before. It
would also pile random logic for overriding IO's in certain cases. This
now makes everything explicit and more clear.
|
|
|
|
|
|
|
|
| |
- removes unused "completed" IO callback handler
- moves primary post-IO callback handlers from the queue definition to
the actual IO objects.
- allows IO object callbacks to be handled generically instead of based
on the queue they were submitted from.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally I envisioned taking an inbound request object, tagging it
with the time, and at the very end of a function logging is called. This
would give you the total time of the "backend" part of a request.
On rethinking, the timing information that's most useful in the proxy's
perspective is the time it takes for a response to happen + the status
of a response. One request may generate several sub-responses and it is
impossible to check the timing of each of those and log outliers.
You now cannot get the total time elapsed in a function anymore, but I
believe that is less useful information to the user of a proxy. The best
picture of latency will still be from the client, and response latency
can educate the proxy on issues with backends.
resp:elapsed() has been added as a compromise; it returns the elapsed
microseconds that a response took, so you can add the time together and
get an approximation of total time (if running req/resp's sequentially).
This change also means that calling mcp.await() and waiting for multiple
responses will give the timing of each sub-response accurately.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was a nightmare to debug; I need some better tools here.
1) There's a helper routine that ensures the lua coroutine is cleared up
if an error happens while handling the network/etc.
2) On reading the value data from a set request, there's one last error
that can happen before the coroutine ownership is taken from the
connection object.
3) The bug was the set read completion code was unreferencing the
coroutine, but could still throw an error if the set data was
malformed.
4) Thus it would double free the reference.
5) Then really weird things wout happen to the registry: the same
reference ID would get handed out twice.
6) This blows up code later on as it gets data it doesn't expect, and
some referenced objects get clobbered.
7) This was triggered in combination of an earlier bug that would cause
bad data chunks on short writes in certain situations.
Took a long time to get a repro case outside of a benchmark; was looking
in the wrong place.
|
|
|
|
|
| |
allow users to differentiate thread functions externally to memcached.
Useful for setting priorities or pinning threads to CPU's.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improvements to handling of new and failed backend socket connections.
Previously connections were initiated immediately, and initially from
the config thread, yet completion of opening sockets wouldn't happen
until a request tried to use that backend.
Now we open connections via the IO thread, as well as validate new
connections with a "version\r\n" command.
Also fixes a couple of error conditions (parsing, backend disconnect)
where clients could hang waiting for a retry time in certain conditions.
Now connections should re-establish immediately and dead backends should
flip into a bad fast-fail state quicker.
|
|
|
|
| |
should make isolation/testing earlier.
|
|
|
|
|
|
|
| |
upstream fixes: mcmc would return OK to garbage responses, which was
probably causing issues in the past.
This does remove the MCMC_CODE_MISS and replace it with MCMC_CODE_END.
|
|
|
|
|
|
|
|
| |
allows using tagged listeners (ex; `-l tag[test]:127.0.0.1:11212`) to
select a top level route for a function.
expects there to not be dozens of listeners, but for a handful will be
faster than a hash table lookup.
|
|
|
|
|
|
|
|
|
| |
delete the magic logging and require mcp.log_req* be used if you want
those types of entries to appear. keeps a separate data stream from
"proxyuser" just in case that's useful.
proxycmds wasn't able to get enough context to autogenerate useful log
lines, so I'd rather not have it in there at all.
|
|
|
|
|
|
| |
Lua level API for logging full context of a request/response. Provides
log_req() for simple logging and log_reqsample() for conditional
logging.
|
|
|
|
|
|
|
|
| |
previously mcp.await() only worked if it was called before any other
dispatches.
also fixes a bug if the supplied pool table was key=value instead of an
array-type table.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
avoids sending the response to the client, in most cases. works by
stripping the noreply status from the request before sending it along,
so the proxy itself knows when to move the request forward.
has sharp edges:
- only looking at the request object that's actually sent to the
backend, instead of the request object that created the coroutine.
- overriding tokens in lua to re-set the noreply mode would break the
protocol.
So this change helps us validate the feature but solidifying it requires
moving it to the "edges" of processing; before the coroutine and after
any command assembly (or within the command assembly).
|
|
|
|
|
|
|
|
|
| |
now's a good time to at least shove functional subsections of code into
their own files. Some further work to clearly separate the API's will
help but looks not too terrible.
Big bonus is getting the backend handling code away from the frontend
handling code, which should make it easier to follow.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds MCP_AWAIT_* options as 4th argument. If waiting for a subset of
requests instead of all requests, the 4th argument can specify what type
of responses are considered valid.
Also now includes _all_ observed results in the result table, when a
specific number of "good" results are requested. The caller is supposed
to filter for its intent. This allows observation of error codes if all
requested results fail or it otherwise fails to meet its result
conditions (via AWAIT_*)
|
|
|
|
|
| |
global only setting, not able to be changed for existing backends on
config reload. a few fixes would improve but not can do later.
|
|
|
|
|
|
|
| |
Fix is a little subtle: there's a race condition when a user stat is
added to the global context but the worker threads aren't updated yet.
So we check that the context in the user thread has enough stats slots
to iterate, and that the user thread has stats at all.
|
| |
|
|
|
|
|
|
|
| |
I wanted to do this via lua with some on-close hooks on the coroutine
but this might work for now. not 100% sure I caught all of the incr/decr
cases properly. Was trying to avoid hitting the counters too hard as
well.
|
| |
|
|
|
|
|
|
|
| |
adds watch commands for:
proxycmds - internal raw timing log (tbd?)
proxyevents - config updates, internal errors, etc
proxyuser - logs generated by mcp.log()
|
|
|
|
| |
added to "stats proxy" output. counters of commands seen at inbound.
|
|
|
|
|
|
|
| |
a couple punts as well. Added malloc checking for hot paths but
deferring for uncommon paths that were a bit harder.
Also hardens the request parser from some underflows/overflows.
|
|
|
|
|
| |
the uring bits make abstraction this to the obvious common function a
little harder, so I skipped for now.
|
|
|
|
|
|
|
| |
two builtin filter options (hash stop and tag), because why not :)
hash defaults caused some code reorganization. default hash dist is now
jump, because I can't think of why you'd use modulus over that.
|
|
|
|
|
|
|
| |
- fixes potential memory leaks if an error is generated while creating a
pool object.
- misc comment updates and error handling.
- avoid crash if attempting to route commands that don't have a key.
|
|
|
|
|
|
|
|
|
| |
Limited on what we can do to fully emulate the original daemon behavior,
so we end up inlining error messages. Hopefully clients can discern as
they should be looking for ^VALUE anyway.
Also removes an error message I added for blank responses. Apparently
multiget is what I had that path for, heh!
|
| |
|
|
|
|
|
|
|
|
|
|
| |
think this ony counts failues related to the connect routine, and when a
backend is considered dead. Looks like I was partway through deciding
how to change that.
I'm sure it's possible for something to fail in such a way that it
connects, then immediately fails, which escapes this routine. Separate
fix for that though.
|
|
|
|
|
| |
refactors the timeouts management code to be generic "tunables" and adds
the backend limit as a configuration option.
|
|
|
|
|
|
|
| |
if a conn goes to sleep while reading set data from the network its
coroutine would be lost, and crash/corruption on next resume.
this now properly handles lifetime/cleanup of the coroutine.
|
|
|
|
|
|
|
| |
proxy does actually accept requests with "\n" and not "\r\n", so this
parser could've read past the end of the input buffer.
adding the end token to the parser would fix this better though.
|
|
|
|
|
|
|
| |
The configuration reloader copies data between the "config" global VM
and individual worker VM's. This fixes some crashes/limitations and
improves error handling. It still has a sharp edge as the actual table
copy is unhandled.
|
|
|
|
|
|
|
| |
- check eventfd error.
- error on missing 'mcp_config_pools' function in config file.
- note lua throws top level errors on malloc failures from
lua_newuserdatauv
|
|
|
|
|
|
|
|
|
| |
Marks TODO/FIXME items that can be done after MVP/V1 status is achieved.
"v2" just means "after v1". Not at all tied to a specific version or any
order.
This cuts us down from 170 items to... 80...
|