summaryrefslogtreecommitdiff
path: root/memcached.h
Commit message (Collapse)AuthorAgeFilesLines
...
* main: split text protocol into proto_text.cdormando2020-07-021-0/+39
| | | | export a lot of the connection handling code from memcached.c
* fix leak in merged resp/read buffersdormando2020-07-021-0/+2
| | | | | | | | | | The list grows toward next, not prev. Also wasn't zeroing out next ptr. Also didn't unmark free for first resp object. (thanks Prudhviraj!) Adds a couple counters so users can tell if something is wrong as well. response_obj_count is solely for response objects in-flight, they should not be held when idle (except for one bundle per worker thread).
* net: remove most response obj cache related codedormando2020-06-091-6/+2
| | | | | | accept but warn if the commandline is used. also keeps the oom counter distinct so users have a chance of telling what type of memory pressure they're under.
* net: carve response buffers from read buffersdormando2020-06-091-0/+12
| | | | | | | | | balancing start arguments for memory limits on read vs response buffers is hard. with this change there's a custom sub-allocator that cuts the small response objects out of larger read objects. it is somewhat complicated and has a few loops. the allocation path is short circuited as much as possible.
* Improve the sig_handler functionTomas Korbar2020-05-121-0/+6
| | | | | | | | Add new enumeration with reasons for stopping memcached. Make sure that restart_mmap_close gets called only when the right signal is received. Improve logging to stderr so user knows why exactly memcached stopped.
* Add: `-o ssl_session_cache`, disabled by defaultKevin Lin2020-03-271-0/+2
| | | | Enables server-side TLS session caching.
* restart: fix rare segfault on shutdowndormando2020-03-251-2/+1
| | | | | | | | | | | | | | | | | | | | | Client connections were being closed and cleaned up after worker threads exit. In 2018 a patch went in to have the worker threads actually free their event base when stopped. If your system is strict enough (which is apparently none out of the dozen+ systems we've tested against!) it will segfault on invalid memory. This change leaves the workers hung while they wait for connections to be centrally closed. I would prefer to have each worker thread close its own connections for speed if nothing else, but we still need to close the listener connections and any connections currently open in side channels. Much apprecation to darix for helping narrow this down, as it presented as a wiped stack that only appeared in a specific build environment on a specific linux distribution. Hopefully with all of the valgrind noise fixes lately we can start running it more regularly and spot these early.
* ssl_errors statKevin Lin2020-03-161-0/+3
|
* meta: arithmetic command for incr/decrdormando2020-03-061-1/+2
| | | | | see doc/protocol.txt. needed slightly different code as we have to generate the response line after the main operation completes.
* add separate limits for connection buffersdormando2020-02-261-1/+7
| | | | | | | | | | | allows specifying a megabyte limit for either response objects or read buffers. this is split among all of the worker threads. useful if connection limit is extremely high and you want to aggressively close connections if something happens and all connections become active at the same time. missing runtime tuning.
* network: transient static read buffer for connsdormando2020-02-011-1/+6
| | | | | | | | | | | | instead of 2k and then realloc all over every time you set a large item, or do large pipelined fetches, use a static slightly larger buffer. Idle connections no longer hold a buffer, freeing up a ton of memory. To maintain compatibility with unbound ASCII multigets, those fall back to the old malloc/realloc/free routine which it's done since the dark ages.
* network: refactor binprot's bizarre rbuf usagedormando2020-02-011-2/+1
| | | | | | | | | | | | | | | | since keys were (maybe) expanding to 64k, binprot overloads the read-into-key "conn_nread" state to read... a specific number of bytes into the read buffer. However keys were never expanded in size, so this ends up being a waste of code and attaches binprot to management of the read buffer directly. Instead, re-parse the binary header if we don't have enough bytes and let try_read_network() handle it like it does for ascii. As a side effect, this prevents multiple memmove's of potentially a large amount of data for pipelined binprot commands, and under NEED_ALIGN.
* network: response stacking for all commandsdormando2020-02-011-55/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change refactors most of memcached's networking frontend code. The goals with this change: - Decouple buffer memory from client connections where plausible, reducing memory usage for currently inactive TCP sessions. - Allow easy response stacking for all protocols. Previously every binary command generates a syscall during response. This is no longer true when multiple commands are read off the wire at once. The new meta protocol and most text protocol commands are similar. - Reduce code complexity for network handling. Remove automatic buffer adjustments, error checking, and so on. This is accomplished by removing the iovec, msg, and "write buffer" structures from connection objects. A `mc_resp` object is now used to represent an individual response. These objects contain small iovec's and enough detail to be late-combined on transmit. As a side effect, UDP mode now works with extstore :) Adding to the iovec always had a remote chance of memory failure, so every call had to be checked for an error. Now once a response object is allocated, most manipulations can happen without any checking. This is both a boost to code robustness and performance for some hot paths. This patch is the first in a series, focusing on the client response.
* Extended configuration option to disable watch commandsCarl Myers2020-01-211-0/+1
| | | | | | Suggested in issue #544. Implementation inspired by the flag which disables flush_all. A similar CLIENT_ERROR is generated should someone attempt a watch command while they are disabled.
* adding missing defaults to the --help outputTharanga Gamaethige2020-01-211-0/+9
|
* stats: Rename stats.c to stats_prefix.cKanak Kshetri2020-01-131-1/+1
|
* Use a proper data type for settings.sig_hupDaniel Schemmel2019-11-091-1/+2
|
* slab rebalance performance improvementsDaniel Byrne2019-10-171-0/+1
| | | | | | | | For full discussion see: https://github.com/memcached/memcached/pull/524 - Avoids looping in most cases where an item had to be force-freed - Avoids re-locking and re-checking already completed memory - Uses a backoff timer for sleeping when busy items are found
* meta text protocol commandsdormando2019-09-301-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | - we get asked a lot to provide a "metaget" command, for various uses (debugging, etc) - we also get asked for random one-off commands for various use cases. - I really hate both of these situations and have been wanting to experiment with a slight tuning of how get commands work for a long time. Assuming that if I offer a metaget command which gives people the information they're curious about in an inefficient format, plus data they don't need, we'll just end up with a slow command with compatibility issues. No matter how you wrap warnings around a command, people will put it into production under high load. Then I'm stuck with it forever. Behold, the meta commands! See doc/protocol.txt and the wiki for a full explanation and examples. The intent of the meta commands is to support any features the binary protocol had over the text protocol. Though this is missing some commands still, it is close and surpasses the binary protocol in many ways.
* restartable cachedormando2019-09-171-0/+2
| | | | | | | | | | | | | | | "-e /path/to/tmpfsmnt/file" SIGUSR1 for graceful stop restart requires the same memory limit, slab sizes, and some other infrequently changed details. Most other options and features can change between restarts. Binary can be upgraded between restarts. Restart does some fixup work on start for every item in cache. Can take over a minute with more than a few hundred million items in cache. Keep in mind when a cache is down it may be missing invalidations, updates, and so on.
* Add a handler for seccomp crashesStanisław Pitucha2019-08-281-0/+2
| | | | | | | | | When seccomp causes a crash, use a SIGSYS action and handle it to print out an error. Most functions are not allowed at that point (no buffered output, no ?printf functions, no abort, ...), so the implementation is as minimal as possible. Print out a message with the syscall number and exit the process (all threads).
* Use correct buffer size for internal URI encoding.Tharanga Gamaethige2019-05-201-0/+3
| | | | | Modified Logger and Crawler to use the correct buffer length when they are printing URI encoded keys. Fixes #471
* widen internal item flags to 16bits.dormando2019-05-201-13/+16
| | | | | | | | | did a weird dance. nsuffix is no longer an 8bit length, replaced with ITEM_CFLAGS bit. This indicates whether there is a 32bit set of client flags in the item or not. possible after removing the inlined ascii response header via previous commit.
* remove inline_ascii_response optiondormando2019-05-201-5/+2
| | | | | | Has defaulted to false since 1.5.0, and with -o modern for a few years before that. Performance is fine, no reported bugs. Always was the intention. Code is simpler without the options.
* -Y [filename] for ascii authentication modedormando2019-05-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Loads "username:password\n" tokens (up to 8) out of a supplied authfile. If enabled, disables binary protocol (though may be able to enable both if sasl is also used?). authentication is done via the "set" command. A separate handler is used to avoid some hot path conditionals and narrow the code executed in an unauthenticated state. ie: set foo 0 0 7\r\n foo bar\r\n returns "STORED" on success. Else returns CLIENT_ERROR with some information. Any key is accepted: if using a client that doesn't try to authenticate when connecting to a pool of servers, the authentication set can be tried with the same key as one that failed to coerce the client to routing to the correct server. Else an "auth" or similar key would always go to the same server.
* close delete + incr item survival bugdormando2019-04-261-0/+1
| | | | | | | | | | | | re #469 - delete actually locks/unlocks the item (and hashes the key!) three times. Inbetween fetch and unlink, a fully locked add_delta() can run, deleting the underlying item. DELETE then returns success despite the original object hopscotching over it. I really need to get to the frontend rewrite soon :( This commit hasn't been fully audited for deadlocks on the stats counters or extstore STORAGE_delete() function, but it passes tests.
* Basic implementation of TLS for memcached.1.5.13Tharanga Gamaethige2019-04-151-2/+32
| | | | | | | | | | | | | Most of the work done by Tharanga. Some commits squashed in by dormando. Also reviewed by dormando. Tested, working, but experimental implementation of TLS for memcached. Enable with ./configure --enable-tls Requires OpenSSL 1.1.0 or better. See `memcached -h` output for usage.
* expand NEED_ALIGN for chunked itemsdormando2018-08-081-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | some whackarse ARM platforms on specific glibc/gcc (new?) versions trip SIGBUS while reading the header chunk for a split item. the header chunk is unfortunate magic: It lives in ITEM_data() at a random offset, is zero sized, and only exists to simplify code around finding the orignial slab class, and linking/relinking subchunks to an item. there's no fix to this which isn't a lot of code. I need to refactor chunked items, and attempted to do so, but couldn't come up with something I liked quickly enough. This change pads the first chunk if alignment is necessary, which wastes bytes and a little CPU, but I'm not going to worry a ton for these obscure platforms. this works with rebalancing because in the case of ITEM_CHUNKED header, it treats the item size as the size of the class it resides in, and memcpy's the item during recovery. all other cases were changes from ITEM_data to a new ITEM_schunk() inline function that is created when NEED_ALIGN is set, else it's equal to ITEM_data still.
* fix ASCII get error handling (+ extstore leak)1.5.9dormando2018-07-071-1/+3
| | | | | | | | | | | apparently since (forever?) the double while loop in process_get_command() would turn into an infinite loop (and leak memory/die) if add_iov() ever failed. The recently added get_extstore() is more likely to spuriously fail, so it turned into a problem. This creates a common path for the key length abort as well. Adds test, which breaks several ways before this patch.
* add utility macro to replace repetitive code snippit.Linkerist2018-06-271-0/+11
|
* Fix segfault: Prevent calling sasl_server_step before sasl_server_startPaul Furtado2018-06-041-0/+1
| | | | | | | | | | | | | | | | | | If sasl_server_step is called on a sasl_conn which has not had sasl_server_start called on it, it will segfault from reading uninitialized memory. Memcached currently calls sasl_server_start when the client sends the PROTOCOL_BINARY_CMD_SASL_AUTH command and sasl_server_step when the client sends the PROTOCOL_BINARY_CMD_SASL_STEP command. So if the client sends SASL_STEP before SASL_AUTH, the server segfaults. For well-behaved clients, this case never happens; but for the java-memcached-client, when configured with an incorrect password, it happens very frequently. This is likely because the client handles auth on a background thread and the socket may be swapped out in the middle of authentication. You can see that code here: https://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/auth/AuthThread.java
* Drop supplementary groups in addition to setgidAnthony Ryan2018-03-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | When launching as root, we drop permissions to run as a specific user, however the way this is done fails to drop supplementary groups from the process. On many systems, the root user is a member of many groups used to guard important system files. If we neglect to drop these groups it's still possible for a compromised memcached instance to access critical system files as though it were running with root level permissions. On any given system `grep root /etc/group` will reveal all the groups root is a member of, and memcached will have access to any file these groups are authorized for, in spite of our attempts to drop this access. It's possible to test if a given memcached instance is affected by this and running with elevated permissions by checking the respective line in procfs: `grep Groups /proc/$pid/status` If any groups are listed, memcached would have access to everything the listed groups have access to. After this patch no groups will be listed and memcached will be locked down properly.
* add_delta: use bool incr to be consistent with other functionsFangrui Song2018-03-141-1/+1
|
* extstore: tuning drop_unread semanticsdormando2017-12-151-0/+1
| | | | | | | | | | | | | there's now an optional ext_drop_under setting which defaults to the same as compact_under, which should be fine. now, if drop_unread is enabled, it only kicks in if there are no pages matching the compaction threshold. This allows you to set a lower compaction frag rate, then start rescuing only non-COLD items if storage is too full. You can also compact up to a point, then allow a buffer of pages to be used before dropping unread. previously enabling drop_unread would always drop_unread even when compacting normally. This limited utility of the feature.
* extstore: close hole with storage trackingdormando2017-12-131-1/+3
| | | | | items expired/evicted while pulling from tail weren't being tracked, leading to a leak of object counts in pages.
* extstore: C version of automove algorithmdormando2017-12-071-0/+1
| | | | | | | | couple TODO items left for a new issue I thought of. Also hardcoded memory buffer size which should be fixed. also need to change the "free and re-init" logic to use a boolean in case any related option changes.
* extstore: experimental per-class free chunk limitdormando2017-12-031-0/+2
| | | | | | | | | | | | | | | | | | | external comands only for the moment. allows specifying per-slab-class how many chunks to leave free before causing flushing to storage. the external page mover algo in previous commits has a few issues: - relies too heavily on page mover. lots of constant activity under load. - adjusting the item age level on the fly is too laggy, and can easily over-frefree or under-free. IE; class 3 has TTL 90, but class 4 has TTL 60 and most of the pages in memory, it won't free much until it lowers to 60. Thinking this would be something like; % of total chunks in slab class. easiest to set as a percentage of total memory or by write rate periodically. from there TTL can be balanced almost as in the original algorithm; keep a small global page pool for small items to allocate memory from, and pull pages from or balance between storage-capable classes to align TTL.
* extstore: ext_compact_under to control compactiondormando2017-11-281-0/+1
| | | | | | | | had a hardcoded value of "start to compact under a slew if more than 3/4ths of pages are used", but this allows it to be set directly. ie; "I have 100 pages but don't want to compact util almost full, and then drop any unread"
* extstore: add ext_drop_unread option + live tunedormando2017-11-281-0/+1
| | | | | | | | | | | | was struggling to figure out how to automatically turn this on or off, but I think it should be part of an outside process. ie; a mechanism should be able to target a specific write rate, and one of its tools for reducing the write rate should be flipping this on. there's *still* a hole where you can't trigger a compaction attempt if there's no fragmentation. I kind of want, if this feature is on, to attempt a compaction on the oldest page while dropping unread items.
* extstore: crawler fix and ext_low_ttl optiondormando2017-11-281-0/+1
| | | | | | | | | | | | LRU crawler was not marking reclaimed expired items as removed from the storage engine. This could cause fragmentation to persist much longer than it should, but would not cause any problems once compaction started. Adds "ext_low_ttl" option. Items with a remaining expiration age below this value are grouped into special pages. If you have a mixed TTL workload this would help prevent low TTL items from causing excess fragmentation/compaction. Pages with low ttl items are excluded from compaction.
* extstore: configure and start time gatingdormando2017-11-281-2/+0
| | | | | ./configure --enable-extstore to compile the feature in specify -o ext_path=/whatever to start.
* extstore: skip unhit objects if full in compactiondormando2017-11-281-0/+5
| | | | | | if < 2 free pages left, "evict" objects which haven't been hit at all. should be better than evicting everything if we can continue compacting.
* external storage base commitdormando2017-11-281-4/+61
| | | | | been squashing reorganizing, and pulling code off to go upstream ahead of merging the whole branch.
* interface code for flash branchdormando2017-09-261-2/+1
| | | | | removes a few ifdef's and upstreams small internal interface tweaks for easy rebase.
* don't create hashtables larger than 32bitdormando2017-08-271-0/+1
| | | | | most other stats related to items should be 64bit, so the totals should be ableable to go higher until we get 64bit hash tables.
* Add drop_privileges() for LinuxStanisław Pitucha2017-08-231-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement an aggressive version of drop_privileges(). Additionally add similar initialization function for threads drop_worker_privileges(). This version is similar to Solaris one and prohibits memcached from making any not approved syscalls. Current list narrows down the allowed calls to socket sends/recvs, accept, epoll handling, futex (and dependencies - mmap), getrusage (for stats), and signal / exit handling. Any incorrect behaviour will result in EACCES returned. This should be restricted further to KILL in the future (after more testing). The feature is only tested for i386 and x86_64. It depends on bpf filters and seccomp enabled in the kernel. It also requires libsecomp for abstraction to seccomp filters. All are available since Linux 3.5. Seccomp filtering can be enabled at compile time with --enable-seccomp. In case of local customisations which require more rights, memcached allows disabling drop_privileges() with "-o no_drop_privileges" at startup. Tests have to run with "-o relaxed_privileges", since they require disk access after the tests complete. This adds a few allowed syscalls, but does not disable the protection system completely.
* hot_max_age is now hot_max_factordormando2017-06-241-1/+1
| | | | | | | | | defaults at 20% of COLD age. hot_max_age was added because many people's caches were sitting at 32% memory utilized (exactly the size of hot). Capping the LRU's by percentage and age would promote some fairness, but I made a mistake making WARM dynamic but HOT static. This is now fixed.
* add a real slab automover algorithmdormando2017-06-231-0/+2
| | | | converts the python script to C, more or less.
* slab_rebal: delete busy items if stuckdormando2017-06-231-0/+3
| | | | | | | | | if we loop through a slab too many times without freeing everything, delete items stuck with high refcounts. they should bleed off so long as the connections aren't jammed holding them. should be possible to force rescues in this case as well, but that's more code so will follow up later. Need a big-ish refactor.
* per-LRU hits breakdowndormando2017-06-041-0/+1
| | | | | | | | no actual speed loss. emulate the slab_stats "get_hits" by totalling up the per-LRU get_hits. could sub-LRU many stats but should use a different command/interface for that.