summaryrefslogtreecommitdiff
path: root/memcached.c
Commit message (Collapse)AuthorAgeFilesLines
* Make memcached setgroups failure non-fataljwbee2019-09-211-2/+12
| | | | A call to setgroups can fail on Linux if the user lacks cap_setgid or if setgroups has been disallowed in a user namespace. The latter happens when the namespace has no GID map or if /proc/pid/setgroups has been changed to "deny". Treating this failure as fatal means that memcached cannot start in a bazel fakeroot. Ignoring failure seems harmless.
* restart: fixes for 32bit and chunked items.1.5.18dormando2019-09-171-1/+4
| | | | | | | | | | severe bug in item chunk fixup (wasn't doing it at all!) failed to check on my 32bit builders... and 32bit platforms weren't working at all. This is a bit of a kludge since I'm still working around having ptrdiff, but it seems to work. also fixes a bug with missing null byte for meta filename.
* restartable cachedormando2019-09-171-22/+419
| | | | | | | | | | | | | | | "-e /path/to/tmpfsmnt/file" SIGUSR1 for graceful stop restart requires the same memory limit, slab sizes, and some other infrequently changed details. Most other options and features can change between restarts. Binary can be upgraded between restarts. Restart does some fixup work on start for every item in cache. Can take over a minute with more than a few hundred million items in cache. Keep in mind when a cache is down it may be missing invalidations, updates, and so on.
* fix strncpy call to avoid ASAN violation1.5.17dormando2019-08-291-2/+20
| | | | | | | | | | | Ensure we're only reading to the size of the smallest buffer, since they're both on the stack and could potentially overlap. Overlapping is defined as ... undefined behavior. I've looked through all available implementations of strncpy and they still only copy from the first \0 found. We'll also never read past the end of sun_path since we _supply_ sun_path with a proper null terminator.
* add server address to the "stats conns" outputTharanga Gamaethige2019-08-291-59/+78
| | | | | | this is helpful when it's required to identify which clients are connected to which server address when memcached listens on multiple addresses
* Add a handler for seccomp crashesStanisław Pitucha2019-08-281-0/+4
| | | | | | | | | When seccomp causes a crash, use a SIGSYS action and handle it to print out an error. Most functions are not allowed at that point (no buffered output, no ?printf functions, no abort, ...), so the implementation is as minimal as possible. Print out a message with the syscall number and exit the process (all threads).
* add error handling when calling dup functionminkikim892019-08-271-1/+14
|
* log client connection id with fetchers and mutationsTharanga Gamaethige2019-08-271-3/+3
| | | | you can now monitor fetch and mutations of a given client
* for configured slab sizes, need strdupShiv Nagarajan2019-08-221-1/+1
|
* move mem_requested from slabs.c to items.cdormando2019-07-261-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_requested is an oddball counter: it's the total number of bytes "actually requested" from the slab's caller. It's mainly used for a stats counter, alerting the user that the slab factor may not be efficient if the gap between total_chunks * chunk_size - mem_requested is large. However, since chunked items were added it's _also_ used to help the LRU balance itself. The total number of bytes used in the class vs the total number of bytes in a sub-LRU is used to judge whether to move items between sub-LRU's. This is a layer violation; forcing slabs.c to know more about how items work, as well as EXTSTORE for calculating item sizes from headers. Further, it turns out it wasn't necessary for item allocation: if we need to evict an item we _always_ pull from COLD_LRU or force a move from HOT_LRU. So the total doesn't matter. The total does matter in the LRU maintainer background thread. However, this thread caches mem_requested to avoid hitting the slab lock too frequently. Since sizes_bytes[] within items.c is generally redundant with mem_requested, we now total sizes_bytes[] from each sub-LRU before starting a batch of LRU juggles. This simplifies the code a bit, reduces the layer violations in slabs.c slightly, and actually speeds up some hot paths as a number of branches and operations are removed completely. This also fixes an issue I was having with the restartable memory branch :) recalculating p->requested and keeping a clean API is painful and slow. NOTE: This will vary a bit compared to what mem_requested originally did, mostly for large chunked items. For items which fit inside a single slab chunk, the stat is identical. However, items constructed by chaining chunks will have a single large "nbytes" value and end up in the highest slab class. Chunked items can be capped with chunks from smaller slab classes; you will see utilization of chunks but not an increase in mem_requested for them. I'm still thinking this through but this is probably acceptable. Large chunked items should be accounted for separately, perhaps with some new counters so they can be discounted from normal calculations.
* Speed up incr/decr by replacing snprintf.1.5.15Tharanga Gamaethige2019-05-201-1/+1
| | | | Uses fast itoa_u64 function instead. Fixes #471
* change some links from http to httpskun2019-05-201-1/+1
|
* widen internal item flags to 16bits.dormando2019-05-201-2/+2
| | | | | | | | | did a weird dance. nsuffix is no longer an 8bit length, replaced with ITEM_CFLAGS bit. This indicates whether there is a 32bit set of client flags in the item or not. possible after removing the inlined ascii response header via previous commit.
* remove inline_ascii_response optiondormando2019-05-201-62/+16
| | | | | | Has defaulted to false since 1.5.0, and with -o modern for a few years before that. Performance is fine, no reported bugs. Always was the intention. Code is simpler without the options.
* -Y [filename] for ascii authentication modedormando2019-05-201-100/+306
| | | | | | | | | | | | | | | | | | | | | | | | | Loads "username:password\n" tokens (up to 8) out of a supplied authfile. If enabled, disables binary protocol (though may be able to enable both if sasl is also used?). authentication is done via the "set" command. A separate handler is used to avoid some hot path conditionals and narrow the code executed in an unauthenticated state. ie: set foo 0 0 7\r\n foo bar\r\n returns "STORED" on success. Else returns CLIENT_ERROR with some information. Any key is accepted: if using a client that doesn't try to authenticate when connecting to a pool of servers, the authentication set can be tried with the same key as one that failed to coerce the client to routing to the correct server. Else an "auth" or similar key would always go to the same server.
* fix: idle-timeout wasn't compatible with binprotdormando2019-05-131-0/+1
| | | | | binary protocol commands never updated the last command time, so would always get disconnected while in use with the idle timeout thread.
* update -h output for -I (max item size)1.5.14dormando2019-04-271-1/+1
| | | | limit got pushed to 1G with chunked items. fixes #473.
* fix segfault in "lru" commanddormando2019-04-271-2/+2
| | | | fixes #474 - off by one in token count.
* extstore: error adjusting page_size after ext_pathdormando2019-04-271-0/+4
| | | | | temporary fix. some folks ... randomize... their start arguments, so need to restructure in a way I'm happy with.
* close delete + incr item survival bugdormando2019-04-261-6/+10
| | | | | | | | | | | | re #469 - delete actually locks/unlocks the item (and hashes the key!) three times. Inbetween fetch and unlink, a fully locked add_delta() can run, deleting the underlying item. DELETE then returns success despite the original object hopscotching over it. I really need to get to the frontend rewrite soon :( This commit hasn't been fully audited for deadlocks on the stats counters or extstore STORAGE_delete() function, but it passes tests.
* FreeBSD superpages checking.David Carlier2019-04-261-1/+19
| | | | | | | This is a knob existing from 7.0 (2008), can be only changed at boot time. It is enabled by default, on usual archs at least, but in some cases it might not be desired so we check it whatsoever.
* Basic implementation of TLS for memcached.1.5.13Tharanga Gamaethige2019-04-151-21/+329
| | | | | | | | | | | | | Most of the work done by Tharanga. Some commits squashed in by dormando. Also reviewed by dormando. Tested, working, but experimental implementation of TLS for memcached. Enable with ./configure --enable-tls Requires OpenSSL 1.1.0 or better. See `memcached -h` output for usage.
* fix INCR/DECR refcount leak for extstore headers1.5.12dormando2018-10-231-0/+1
| | | | | | Bug added in 2014. Same condition reused to bounce incr/decr commands off of CHUNKED and ITEM_HDR items. Thus, incr/decr'ing a value that is already in extstore would immediately cause a refcount leak :(
* extstore: balance IO thread queues1.5.11dormando2018-10-021-0/+1
| | | | | | | | | queues were roundrobin before. during sustained overload some queues can get behind while other stays empty. Simply do a bit more work to track depth and pick the lowest queue. This is fine for now since the bottleneck remains elsewhere. Been meaning to do this, benchmark work made it more obvious.
* Remove some unused variables from mainNick Frost2018-08-151-9/+0
|
* expand NEED_ALIGN for chunked itemsdormando2018-08-081-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | some whackarse ARM platforms on specific glibc/gcc (new?) versions trip SIGBUS while reading the header chunk for a split item. the header chunk is unfortunate magic: It lives in ITEM_data() at a random offset, is zero sized, and only exists to simplify code around finding the orignial slab class, and linking/relinking subchunks to an item. there's no fix to this which isn't a lot of code. I need to refactor chunked items, and attempted to do so, but couldn't come up with something I liked quickly enough. This change pads the first chunk if alignment is necessary, which wastes bytes and a little CPU, but I'm not going to worry a ton for these obscure platforms. this works with rebalancing because in the case of ITEM_CHUNKED header, it treats the item size as the size of the class it resides in, and memcpy's the item during recovery. all other cases were changes from ITEM_data to a new ITEM_schunk() inline function that is created when NEED_ALIGN is set, else it's equal to ITEM_data still.
* extstore JBOD supportdormando2018-08-061-18/+20
| | | | | | | | | | | | | | | | Just a Bunch Of Devices :P code exists for routing specific devices to specific buckets (lowttl/compact/etc), but enabling it requires significant fixes to compaction algorithm. Thus it is disabled as of this writing. code cleanups and future work: - pedantically freeing memory and closing fd's on exit - unify and flatten the free_bucket code - defines for free buckets - page eviction adjustment (force min-free per free bucket) - fix default calculation for compact_under and drop_under - might require forcing this value only on default bucket
* split storage writer into its own threaddormando2018-08-031-0/+4
| | | | | | | | | | | trying out a simplified slab class backoff algorithm. The LRU maintainer individually schedules slab classes by time, which leads to multiple wakeups in a steady state as they get out of sync. This algorithm more simply skips that class more often each time it runs the main loop, using a single scheduled sleep instead. if it goes to sleep for a long time, it also reduces the backoff for all classes. if we're barely awake it should be fine to poke everything.
* fix ASCII get error handling (+ extstore leak)1.5.9dormando2018-07-071-25/+45
| | | | | | | | | | | apparently since (forever?) the double while loop in process_get_command() would turn into an infinite loop (and leak memory/die) if add_iov() ever failed. The recently added get_extstore() is more likely to spuriously fail, so it turned into a problem. This creates a common path for the key length abort as well. Adds test, which breaks several ways before this patch.
* drop_privileges is no longer default if available.dormando2018-07-061-2/+11
| | | | | | | | | | | adds `-o drop_privileges` along with existing no_drop_privileges. Feature is experimental, and causing some user pain: I'd forgotten about the disable flag entirely, somehow. Changing defaults (especially for a security feature) is not a typical thing to do, but we should have done this from the start like all other features: initially gated, then added to `modern`, then switched to default once mature.
* support transparent hugepages on LinuxChen-Yu Tsai2018-07-051-0/+10
| | | | | | | | | | | | | | | | | | | Linux has supported transparent huge pages on Linux for quite some time. Memory regions can be marked for conversion to huge pages with madvise. Alternatively, Users can have the system default to using huge pages for all memory regions when applicable, i.e. when the mapped region is large enough, the properly aligned pages will be converted. Using either method, we would preallocate memory for the cache with proper alignment, and call madvise on it. Whether the memory region actually gets converted to hugepages ultimately depends on the setting of /sys/kernel/mm/transparent_hugepage/enabled. The existence of this file is also checked to see if transparent huge pages support is compiled into the kernel. If any step of the preallocation fails, we simply fallback to standard allocation, without even preallocating slabs, as they would not have the proper alignment or settings anyway.
* add utility macro to replace repetitive code snippit.Linkerist2018-06-271-32/+5
|
* add several extstore options for help info.Linkerist2018-06-271-0/+2
|
* Fix segfault: Prevent calling sasl_server_step before sasl_server_startPaul Furtado2018-06-041-0/+9
| | | | | | | | | | | | | | | | | | If sasl_server_step is called on a sasl_conn which has not had sasl_server_start called on it, it will segfault from reading uninitialized memory. Memcached currently calls sasl_server_start when the client sends the PROTOCOL_BINARY_CMD_SASL_AUTH command and sasl_server_step when the client sends the PROTOCOL_BINARY_CMD_SASL_STEP command. So if the client sends SASL_STEP before SASL_AUTH, the server segfaults. For well-behaved clients, this case never happens; but for the java-memcached-client, when configured with an incorrect password, it happens very frequently. This is likely because the client handles auth on a background thread and the socket may be swapped out in the middle of authentication. You can see that code here: https://github.com/dustin/java-memcached-client/blob/master/src/main/java/net/spy/memcached/auth/AuthThread.java
* alignment and 32bit fixes for extstoredormando2018-05-221-2/+13
| | | | | | | | memory alignment when reading header data back. left "32" in a few places that should've at least been a define, is now properly an offsetof. used for skipping crc32 for dynamic parts of the item headers.
* extstore: fix ref leak when using binprot GATK1.5.7dormando2018-03-271-3/+4
| | | | | | | | | | | | GATK returns a key but not the value. c->io_wraplist is only appended if the value is to be returned, but c->item is skipped if it is an ITEM_HDR at all. This now checks for the ITEM_HDR bit being set but also !value which then reclaims the reference normally. I knew doubling up the cleanup code made it a lot more complex, and hope to flatten that to a single path. Also the TOUCH/GAT/GATK binprot code has no real test coverage, nor mc-crusher entries. Should be worth fixing.
* Drop supplementary groups in addition to setgidAnthony Ryan2018-03-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | When launching as root, we drop permissions to run as a specific user, however the way this is done fails to drop supplementary groups from the process. On many systems, the root user is a member of many groups used to guard important system files. If we neglect to drop these groups it's still possible for a compromised memcached instance to access critical system files as though it were running with root level permissions. On any given system `grep root /etc/group` will reveal all the groups root is a member of, and memcached will have access to any file these groups are authorized for, in spite of our attempts to drop this access. It's possible to test if a given memcached instance is affected by this and running with elevated permissions by checking the respective line in procfs: `grep Groups /proc/$pid/status` If any groups are listed, memcached would have access to everything the listed groups have access to. After this patch no groups will be listed and memcached will be locked down properly.
* update --help for UDP defaultdormando2018-03-021-1/+1
|
* disable UDP port by default1.5.6dormando2018-02-271-4/+2
| | | | | | | | | | | | | | | As reported, UDP amplification attacks have started to use insecure internet-exposed memcached instances. UDP used to be a lot more popular as a transport for memcached many years ago, but I'm not aware of many recent users. Ten years ago, the TCP connection overhead from many clients was relatively high (dozens or hundreds per client server), but these days many clients are batched, or user fewer processes, or simply anre't worried about it. While changing the default to listen on localhost only would also help, the true culprit is UDP. There are many more use cases for using memcached over the network than there are for using the UDP protocol.
* Replace event_init() with new API if detect newer versionQian Li2018-02-191-0/+13
| | | | | | | | | | | If we detect libevent version >= 2.0.2-alpha, change to use event_base_new_with_config() instead of obsolete event_init() for creating new event base. Set config flag to EVENT_BASE_FLAG_NOLOCK to avoid lock/unlock around every libevent operation. For newer version of libevent, the event_init() API is deprecated and is totally unsafe for multithreaded use. By using the new API, we can explicitly disable/enable locking on the event_base.
* non-issue leak found via static analysis #2dormando2018-02-191-1/+2
| | | | | issue #338 reported a memory leak in the init code. another non-issue, since it's a handful of bytes and that code path is only used in a couple of tests.
* non-issue leak found via static analysisdormando2018-02-191-0/+2
| | | | issue #337 reported a memory leak, but in these cases the process exits anyway.
* fix gcc warningsMiroslav Lichvar2018-02-191-1/+1
|
* remove redundant counter/lock from hash table1.5.5dormando2018-02-121-0/+4
| | | | | | | | | | | | | | curr_items tracks how many items are linked in the hash table. internally to the hash table, hash_items tracked how many items were in the hash table. on every insert/delete, hash_items had to be locked and checked to see if th table should be expanded. rip that all out, and call a check with the once-per-second clock event to check for hash table expansion. this actually ends up fixing an obscure bug: if you burst a bunch of sets then stop, the hash table won't attempt to expand a second time until the next insert. with this change, every second the hash table has a chance of expanding again.
* limit crawls for metadumperdormando2018-02-121-2/+3
| | | | | | | | | | | | | | | LRU crawler metadumper is used for getting snapshot-y looks at the LRU's. Since there's no default limit, it'll get any new items added or bumped since the roll started. with this change it limits the number of items dumped to the number that existed in that LRU when the roll was kicked off. You still end up with an approximation, but not a terrible one: - items bumped after the crawler passes them likely won't be revisited - items bumped before the crawler passes them will likely be visited toward the end, or mixed with new items. - deletes are somewhere in the middle.
* extstore: fix segfault in 'extstore' adm commanddormando2018-01-231-1/+1
| | | | would segfault if you gave it only 2 arguments :|
* extstore: doc fixesdormando2017-12-201-1/+1
| | | | | needs more writing. will happen over time. at least --help has the right number of newlines...
* extstore: handle errors in binprot pathdormando2017-12-201-9/+19
| | | | | I purposefully broke _get_extstore and it turned it into a miss as expected. tests pass with/without extstore.
* extstore: fix bad default for free_memchunksdormando2017-12-191-2/+2
| | | | | was initializing to 1... but we want it to be zero until the thing has a chance to fill and flip on the balancer algo.
* extstore: default item_age to UINT_MAXdormando2017-12-181-1/+1
| | | | | with page mover algo being less shitty, shouldn't rely on item_age except for weird scenarios.