diff options
author | dormando <dormando@rydia.net> | 2023-01-22 15:11:18 -0800 |
---|---|---|
committer | dormando <dormando@rydia.net> | 2023-01-22 15:11:18 -0800 |
commit | ac55ac888e6252836ca4a233daf79253934c5728 (patch) | |
tree | 025bc5b2851ac646e1564dcf4db49bca6444447b /t/issue_192.t | |
parent | 3b9f7300b012805beaa6e0f01fcfde2ddcfe6b01 (diff) | |
download | memcached-ac55ac888e6252836ca4a233daf79253934c5728.tar.gz |
proxy: fix mismatched responses after bad write
Regression from the IO thread performance fix (again...) back in early
december.
Was getting corrupt backends if IO's were flushed in a very specific
way, which would give bad data to clients. Once traffic stops the
backends would timeout (waiting for responses that were never coming)
and reset themselves.
The optimization added was to "fast skip" IO's that were already flushed
to the network by tracking a pointer into the list of IO's.
The bug requires a series of events:
1) the "prep write command" function notes a pointer into the top of the
backend IO stack.
2) a write to the backend socket resulting in an EAGAIN (no bytes written,
try again later).
3) reads then complete from the backend, changing the list of IO objects.
4) "prep write command" tries again from a now invalid backend object.
The fix:
1) only set the offset pointer _post flush_ to the last specifically
non-flushed IO object, so if the list changes it should always be
behind the IO pointer.
2) the IO pointer is nulled out immediately if flushing is complete.
Took staring at it for a long time to understand this. I've rewritten
this change once.
I will split the stacks for "to-write queue" and "to-read queue" soon.
That should be safer.
Diffstat (limited to 't/issue_192.t')
0 files changed, 0 insertions, 0 deletions