| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
A change was accidentally introduced where part of the graceful shutdown
code (stop_threads() -> close_conns()) would execute during non-graceful
shutdown (INT/TERM). This could lead to hangs or bugs if using code that
does not support graceful shutdown (proxy).
This does not restore the old method of immediately exiting, and still
frees some memory and returns from main(), but it no longer attempts to
stop all the worker threads.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
have a bug where updating a token and then requesting it again returns
the previous token. Also have other branches which require use of the
flattened request.
Removes the extra allocation space for lua request objects as we're not
flattening into the end of the memory.
I was originally doing this using a lot of lua but just copying the
string a few times has some better properties:
1) should actually be faster with less lua + fewer allocations
2) can be optimized to do minimal copying (avoid keys, append new flags,
etc)
|
|
|
|
|
| |
I'm a bit concerned that the compile warning popped up only after I
started changing something else. Need to upgrade my OS again? :(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Originally I envisioned taking an inbound request object, tagging it
with the time, and at the very end of a function logging is called. This
would give you the total time of the "backend" part of a request.
On rethinking, the timing information that's most useful in the proxy's
perspective is the time it takes for a response to happen + the status
of a response. One request may generate several sub-responses and it is
impossible to check the timing of each of those and log outliers.
You now cannot get the total time elapsed in a function anymore, but I
believe that is less useful information to the user of a proxy. The best
picture of latency will still be from the client, and response latency
can educate the proxy on issues with backends.
resp:elapsed() has been added as a compromise; it returns the elapsed
microseconds that a response took, so you can add the time together and
get an approximation of total time (if running req/resp's sequentially).
This change also means that calling mcp.await() and waiting for multiple
responses will give the timing of each sub-response accurately.
|
|
|
|
|
|
|
|
|
| |
- The await return process uses the "main" VM to move the response into
the table we will eventually return to the user.
- During the reload routine a nil can be left on the top of the main VM
stack.
- Safest to just use top-relative indexing for most cases where we use
the main VM and aren't explicitly clearing the stack beforehand.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
expands on the io_uring port for the IO thread, moving write calls
behind the ring.
unfortunately this does not result in any performance uplift. I also
tried moving the eventfd related writes (which was backed out later).
The code still needs to be updated to generate SQE's outside of the CQE
loop (see comments above the uring event loop).
It is still generally buggy and not to be used, but keeping it up to
date every once in a while will let us use it if it ever does become
performant enough...
This also makes use of a prior optimization to simplify the write flush
code a little. It no longer needs to track how many bytes were intended
to write to know if it should continue flushing later.
|
|
|
|
|
|
|
|
| |
updates the io_uring code to match the updates on the libevent side.
needs more work before merge:
- auditing error conditions
- try harder for some code deduplication
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A backend's connection object is technically owned by the IO thread
after it has been created. An error in how this was done lead to invalid
backends being infinitely retried despite the underlying object being
collected.
This change adds an extra indirection to backend objects: a backend_wrap
object, which just turns the backend connection into an arbitrary
pointer instead of lua memory owned by the config VM.
- When backend connections are created, this pointer is shipped to the
IO thread to have its connection instantiated.
- When the wrap object is garbage collected (ie; no longer referenced by
any pool object), the be conn. pointer is again shipped to the IO
thread, which then removes any pending events, closes the sock, and
frees data.
|
|
|
|
|
| |
disconnect wasn't zeroing out the sockfd so it could've been called
twice in error/etc.
|
|
|
|
|
|
|
|
|
| |
There's at least one obscure code path where the be->io_next was being
reset and then immediately flushed, which I found after running a
torture test for a while.
Decided to just armor the write prep function to seed be->io_next if
missing, which has been working without crashing since.
|
|
|
|
|
|
|
| |
1) more IOV's per syscall
2) if a backend got a large stack of pending IO's + continual writes the
CPU usage of the IO thread would bloat while looping past already
flushed IO objects.
|
|
|
|
|
|
| |
mcp.await(request, pools, 0, mcp.AWAIT_BACKGROUND) will, instead of
waiting on any request to return, simply return an empty table as soon
as the background requests are dispatched.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was a nightmare to debug; I need some better tools here.
1) There's a helper routine that ensures the lua coroutine is cleared up
if an error happens while handling the network/etc.
2) On reading the value data from a set request, there's one last error
that can happen before the coroutine ownership is taken from the
connection object.
3) The bug was the set read completion code was unreferencing the
coroutine, but could still throw an error if the set data was
malformed.
4) Thus it would double free the reference.
5) Then really weird things wout happen to the registry: the same
reference ID would get handed out twice.
6) This blows up code later on as it gets data it doesn't expect, and
some referenced objects get clobbered.
7) This was triggered in combination of an earlier bug that would cause
bad data chunks on short writes in certain situations.
Took a long time to get a repro case outside of a benchmark; was looking
in the wrong place.
|
|
|
|
|
|
|
| |
proxy_req_active shows the number of active proxy requests, but if those
proxy requests make sub-requests via mcp.await() they are not accounted
for. This gives the number of await's active, but not the total number
of in-flight async requests.
|
|
|
|
| |
previous code would disconnect the backend on a short read.
|
|
|
|
|
|
|
|
|
| |
If we got a partial write while flushing commands to the backend it was
not advancing the IOV pointer properly and re-flushing the same data.
This would usually lead to a "CLIENT_ERROR bad data chunk" from the
server for set commands.
I super love/hate scatter/gather code...
|
|
|
|
|
|
|
|
| |
previously the "parsing" bucket held all backend state machine errors.
This cleans up an unused state machine branch and adds specific errors.
Still doesn't have a good way of getting the mcmc parsing error without
a debugger but that path needs more work regardless.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the prior fix to backend validation was incomplete: there was always a
race during connection because the logic checking the result of flags
was in the wrong place.
Thus, if flooding with traffic during a reconnect it was still possible
to write requests down the pipe or to overwrite the event handler.
So this now skips both flushing and event handling for sure while
connecting or validating (which I will likely merge into a single
"valid" flag for a small speedup later)
I was expecting a bug on the validating code but not on the existing
code, but that race window is very short in my test rig so I was unable
to reproduce it.
Using a lot more traffic worked however.
|
|
|
|
|
|
|
|
|
|
|
| |
Two bugs in the backend validation stage:
1) If requests are coming in while validation is happening, the event
handler will be reset to the main handler, which would get a VERSION
response and blow up on a parsing error.
2) If the validate request results in an EAGAIN it would automatically
pass validation, which would lead to the same bug as above.
|
|
|
|
|
|
| |
If for some reason a res object is not passed into mcp.log_req or
mcp.log_reqsample it would still strlen() the backend name/port's, which
could crash.
|
|
|
|
|
|
|
|
|
| |
Was not clearing an internal variable in the loop for reading listeners,
so if you used a single -l command to create several tagged listeners
and the first tag was longer than the next tag, it would not properly
read the tag.
tag tag tag.
|
|
|
|
| |
to help with troubleshooting when users end up with core dumps.
|
|
|
|
|
| |
allow users to differentiate thread functions externally to memcached.
Useful for setting priorities or pinning threads to CPU's.
|
|
|
|
|
|
| |
crept in on an earlier cleanup commit. for some reason pools get gc'ed
pretty quickly but backends take several HUP's so I didn't see this
until I sat here HUP'ing the proxy 10+ times in a row.
|
|
|
|
|
| |
acknowledgement clause was removed a long time ago so it's no longer
necessary to include in the binary output.
|
|
|
|
|
|
| |
Fix unused variable error when dtrace is not enabled.
Add void parameter declaration for drop_privileges() in darwin_priv.c.
|
|
|
|
|
| |
was a temporary hidden option for a few early adopters to use while
migrating from OK to HD status codes.
|
|
|
|
|
|
|
|
| |
"mg" required at least one flag. now "mg key" returns a bare HD on hit
if you don't care about the value.
HD modes would reflect O and k flags in the response, but EN didn't.
This is now fixed for full coverage.
|
|
|
|
| |
for HD/NF/etc responses but not VA.
|
|
|
|
| |
was returning "HD \r\n" and "HD Oetc\r\n" - not to protocol spec.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improvements to handling of new and failed backend socket connections.
Previously connections were initiated immediately, and initially from
the config thread, yet completion of opening sockets wouldn't happen
until a request tried to use that backend.
Now we open connections via the IO thread, as well as validate new
connections with a "version\r\n" command.
Also fixes a couple of error conditions (parsing, backend disconnect)
where clients could hang waiting for a retry time in certain conditions.
Now connections should re-establish immediately and dead backends should
flip into a bad fast-fail state quicker.
|
|
|
|
|
|
|
|
|
|
| |
without this patch, tests failed with
# Failed test 'saw a real log line after a skip'
# at t/watcher.t line 54.
# 'ts=-2126224733.765946 gid=80003 type=item_get key=foo status=not_found clsid=0 cfd=23 size=0
# '
# doesn't match '(?^:ts=\d+\.\d+\ gid=\d+ type=item_get)'
|
|
|
|
|
|
|
|
|
|
|
| |
clang-15+ has started diagnosing them as errors
thread.c:925:18: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
| void STATS_UNLOCK() {
| ^
| void
Signed-off-by: Khem Raj <raj.khem@gmail.com>
|
|
|
|
| |
returns early on a hit, else waits for N non-error responses.
|
|
|
|
| |
should make isolation/testing earlier.
|
|
|
|
|
|
|
| |
upstream fixes: mcmc would return OK to garbage responses, which was
probably causing issues in the past.
This does remove the MCMC_CODE_MISS and replace it with MCMC_CODE_END.
|
|
|
|
|
|
|
|
| |
When allocating sub-max chunks for the tail end of a large item the
allocator would only look at the exact slab class. If items in a cache
are all exclusively large, these slab classes could be empty. Now as a
fallback it will also check and evict from the largest slab class even
if it doeesn't necessarily want the largest chunk.
|
|
|
|
|
|
|
| |
By default OpenSSL uses static large read/write buffers with TLS
connections. For memcached instances with a lot of client connections
this can quickly add up to gigabytes of memory. This options allows the
buffers to release when the clients are idle.
|
|
|
|
| |
ref: #884
|
|
|
|
|
| |
fixes failing tests and scenarios where a lot of memory is freed up at
once.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extstore has a background thread which examines slab classes for items
to flush to disk. The thresholds for flushing to disk are managed by a
specialized "slab automove" algorithm. This algorithm was written in
2017 and not tuned since.
Most serious users set "ext_item_age=0" and force flush all items. This
is partially because the defaults do not flush aggressively enough,
which causes memory to run out and evictions to happen.
This change simplifies the slab automove portion. Instead of balancing
free chunks of memory per slab class, it sets a target of a certain
number of free global pages.
The extstore flusher thread also uses the page pool and some low chunk
limits to decide when to start flushing. Its sleep routines have also
been adjusted as it could oversleep too easily.
A few other small changes were required to avoid over-moving slab pages
around.
|
|
|
|
|
| |
At least FreeBSD has perl in /usr/local/bin/perl and no symlink by
default.
|
|
|
|
| |
Also linux.
|
| |
|
|
|
|
| |
detects arm64 crc32 h/w support.
|
|
|
|
|
|
| |
The header generated comes with $ IDs thus breaking the build.
The probes are set with const address arguments already
which just add the qualifier again.
|
|
|
|
|
| |
introduced during removal of strcat calls. seems to only affect older
gcc's, so maybe a misfire warning that got removed.
|
|
|
|
|
|
| |
Use pkg-config to retrieve openssl dependencies such as -latomic or -lz
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
|
| |
|