summaryrefslogtreecommitdiff
path: root/src/core/socket.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #4067 from poettering/invocation-idZbigniew Jędrzejewski-Szmek2016-10-111-1/+4
|\ | | | | Add an "invocation ID" concept to the service manager
| * core: add "invocation ID" concept to service managerLennart Poettering2016-10-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new invocation ID concept to the service manager. The invocation ID identifies each runtime cycle of a unit uniquely. A new randomized 128bit ID is generated each time a unit moves from and inactive to an activating or active state. The primary usecase for this concept is to connect the runtime data PID 1 maintains about a service with the offline data the journal stores about it. Previously we'd use the unit name plus start/stop times, which however is highly racy since the journal will generally process log data after the service already ended. The "invocation ID" kinda matches the "boot ID" concept of the Linux kernel, except that it applies to an individual unit instead of the whole system. The invocation ID is passed to the activated processes as environment variable. It is additionally stored as extended attribute on the cgroup of the unit. The latter is used by journald to automatically retrieve it for each log logged message and attach it to the log entry. The environment variable is very easily accessible, even for unprivileged services. OTOH the extended attribute is only accessible to privileged processes (this is because cgroupfs only supports the "trusted." xattr namespace, not "user."). The environment variable may be altered by services, the extended attribute may not be, hence is the better choice for the journal. Note that reading the invocation ID off the extended attribute from journald is racy, similar to the way reading the unit name for a logging process is. This patch adds APIs to read the invocation ID to sd-id128: sd_id128_get_invocation() may be used in a similar fashion to sd_id128_get_boot(). PID1's own logging is updated to always include the invocation ID when it logs information about a unit. A new bus call GetUnitByInvocationID() is added that allows retrieving a bus path to a unit by its invocation ID. The bus path is built using the invocation ID, thus providing a path for referring to a unit that is valid only for the current runtime cycleof it. Outlook for the future: should the kernel eventually allow passing of cgroup information along AF_UNIX/SOCK_DGRAM messages via a unique cgroup id, then we can alter the invocation ID to be generated as hash from that rather than entirely randomly. This way we can derive the invocation race-freely from the messages.
* | core: when determining whether a process exit status is clean, consider ↵Lennart Poettering2016-10-101-1/+1
|/ | | | | | | | | | | | | | whether it is a command or a daemon SIGTERM should be considered a clean exit code for daemons (i.e. long-running processes, as a daemon without SIGTERM handler may be shut down without issues via SIGTERM still) while it should not be considered a clean exit code for commands (i.e. short-running processes). Let's add two different clean checking modes for this, and use the right one at the appropriate places. Fixes: #4275
* core: Fix USB functionfs activation and clarify its documentation (#4188)Paweł Szewczyk2016-09-261-7/+2
| | | | | | | | | | There was no certainty about how the path in service file should look like for usb functionfs activation. Because of this it was treated differently in different places, which made this feature unusable. This patch fixes the path to be the *mount directory* of functionfs, not ep0 file path and clarifies in the documentation that ListenUSBFunction should be the location of functionfs mount point, not ep0 file itself.
* core: add RemoveIPC= settingLennart Poettering2016-08-191-0/+2
| | | | | | | | | | | | | | | | | | This adds the boolean RemoveIPC= setting to service, socket, mount and swap units (i.e. all unit types that may invoke processes). if turned on, and the unit's user/group is not root, all IPC objects of the user/group are removed when the service is shut down. The life-cycle of the IPC objects is hence bound to the unit life-cycle. This is particularly relevant for units with dynamic users, as it is essential that no objects owned by the dynamic users survive the service exiting. In fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set. In order to communicate the UID/GID of an executed process back to PID 1 this adds a new "user lookup" socket pair, that is inherited into the forked processes, and closed before the exec(). This is needed since we cannot do NSS from PID 1 due to deadlock risks, However need to know the used UID/GID in order to clean up IPC owned by it if the unit shuts down.
* Merge pull request #3818 from poettering/exit-status-envZbigniew Jędrzejewski-Szmek2016-08-051-12/+10
|\ | | | | beef up /var/tmp and /tmp handling; set $SERVICE_RESULT/$EXIT_CODE/$EXIT_STATUS on ExecStop= and make sure root/nobody are always resolvable
| * core: remember first unit failure, not last unit failureLennart Poettering2016-08-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the result value of a unit was overriden with each failure that took place, so that the result always reported the last failure that took place. With this commit this is changed, so that the first failure taking place is stored instead. This should normally not matter much as multiple failures are sufficiently uncommon. However, it improves one behaviour: if we send SIGABRT to a service due to a watchdog timeout, then this currently would be reported as "coredump" failure, rather than the "watchodg" failure it really is. Hence, in order to report information about the type of the failure, and not about the effect of it, let's change this from all unit type to store the first, not the last failure. This addresses the issue pointed out here: https://github.com/systemd/systemd/pull/3818#discussion_r73433520
| * core: turn various execution flags into a proper flags parameterLennart Poettering2016-08-041-7/+5
| | | | | | | | | | | | | | | | | | | | | | The ExecParameters structure contains a number of bit-flags, that were so far exposed as bool:1, change this to a proper, single binary bit flag field. This makes things a bit more expressive, and is helpful as we add more flags, since these booleans are passed around in various callers, for example service_spawn(), whose signature can be made much shorter now. Not all bit booleans from ExecParameters are moved into the flags field for now, but this can be added later.
* | socket: add helper function to remove code duplicationZbigniew Jędrzejewski-Szmek2016-08-051-53/+29
| |
* | core/socket: include remote address in the message when dropping connectionZbigniew Jędrzejewski-Szmek2016-08-051-3/+7
| | | | | | | | | | | | Without the address the message is not very useful. Aug 04 23:52:21 rawhide systemd[1]: testlimit.socket: Too many incoming connections (4) from source ::1, dropping connection.
* | systemd: do not serialize peer, bump count when deserializing socket insteadZbigniew Jędrzejewski-Szmek2016-08-051-41/+1
| |
* | core/socket: rework SocketPeer refcountingZbigniew Jędrzejewski-Szmek2016-08-051-94/+100
| | | | | | | | | | Make functions and definitions that don't need to be shared local to socket.c.
* | systemd: convert peers_by_address to a setZbigniew Jędrzejewski-Szmek2016-08-041-8/+8
|/
* socket: add support to control no. of connections from one source (#3607)Susant Sahani2016-08-021-0/+185
| | | | | | Introduce MaxConnectionsPerSource= that is number of concurrent connections allowed per IP. RFE: 1939
* core: add a concept of "dynamic" user ids, that are allocated as long as a ↵Lennart Poettering2016-07-221-1/+14
| | | | | | | | | | | | | | | | | | | service is running This adds a new boolean setting DynamicUser= to service files. If set, a new user will be allocated dynamically when the unit is started, and released when it is stopped. The user ID is allocated from the range 61184..65519. The user will not be added to /etc/passwd (but an NSS module to be added later should make it show up in getent passwd). For now, care should be taken that the service writes no files to disk, since this might result in files owned by UIDs that might get assigned dynamically to a different service later on. Later patches will tighten sandboxing in order to ensure that this cannot happen, except for a few selected directories. A simple way to test this is: systemd-run -p DynamicUser=1 /bin/sleep 99999
* tree-wide: htonl() is weird, let's use htobe32() instead (#3538)Lennart Poettering2016-06-151-8/+8
| | | Super-important change, yeah!
* Merge pull request #3202 from poettering/socket-fixesMartin Pitt2016-05-081-58/+114
|\ | | | | don't reopen socket fds when reloading the daemon
| * core: rework how we flush incoming traffic when a socket unit goes downLennart Poettering2016-05-061-20/+19
| | | | | | | | | | | | | | | | | | Previously, we'd simply close and reopen the socket file descriptors. This is problematic however, as we won't transition through the SOCKET_CHOWN state then, and thus the file ownership won't be correct for the sockets. Rework the flushing logic, and actually read any queued data from the sockets for flushing, and accept any queued messages and disconnect them.
| * core: don't implicit open missing socket fds on daemon reloadLennart Poettering2016-05-061-8/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when the daemon was reloaded and the configuration of a socket unit file was changed so that a different set of socket ports was defined for the socket we'd simply reopen the socket fds not yet open. This is problematic however, as this means the SOCKET_CHOWN state is not run for them, and thus their UID/GID is not corrected. With this change, don't open the missing file descriptors, but log about this issue, and ask the user to restart the socket explicit, to make sure all missing fds are opened. Fixes: #3171
| * core: split out selinux label retrieval logic into a function of its ownLennart Poettering2016-05-061-30/+49
| | | | | | | | This should bring no behavioural change.
* | core: dump TriggerLimitIntervalSec and TriggerLimitBurst tooEvgeny Vereshchagin2016-05-061-0/+6
|/
* core: fix owner user/group output in socket dumpLennart Poettering2016-05-051-4/+5
| | | | | | | The unit file settings are called SocketUser= and SocketGroup= hence name these fields that way in the "systemd-analyze dump" output too. https://github.com/systemd/systemd/issues/3171#issuecomment-216216995
* core: change default trigger limits for socket unitsLennart Poettering2016-05-051-1/+21
| | | | | | | Let's lower the default values a bit, and pick different defaults for Accept=yes and Accept=no sockets. Fixes: #3167
* core: don't propagate service state to sockets as long as there's still a ↵Lennart Poettering2016-05-021-4/+13
| | | | job for the service queued
* core: move enforcement of the start limit into per-unit-type code againLennart Poettering2016-05-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Let's move the enforcement of the per-unit start limit from unit.c into the type-specific files again. For unit types that know a concept of "result" codes this allows us to hook up the start limit condition to it with an explicit result code. Also, this makes sure that the state checks in clal like service_start() may be done before the start limit is checked, as the start limit really should be checked last, right before everything has been verified to be in order. The generic start limit logic is left in unit.c, but the invocation of it is moved into the per-type files, in the various xyz_start() functions, so that they may place the check at the right location. Note that this change drops the enforcement entirely from device, slice, target and scope units, since these unit types generally may not fail activation, or may only be activated a single time. This is also documented now. Note that restores the "start-limit-hit" result code that existed before 6bf0f408e4833152197fb38fb10a9989c89f3a59 already in the service code. However, it's not introduced for all units that have a result code concept. Fixes #3166.
* core: rework socket/service GC logicLennart Poettering2016-04-291-4/+1
| | | | | | | | There's no need to set the no_gc bit for service units that socket units prepare, as we always keep a proper reference (as maintained by unit_ref_set()) on them, and such references are honoured by the GC logic anyway. Moreover, explicitly setting the no_gc bit is problematic if the socket gets GC'ed for a reason, as the service might then leak with the bit set.
* socket: really always close auxiliary fds when closing socket fdsLennart Poettering2016-04-291-26/+24
|
* core: make sure to close connection fd when we fail to activate a ↵Lennart Poettering2016-04-291-1/+5
| | | | | | per-connection service Fixes: #2993 #2691
* core: introduce activation rate limiting for socket unitsLennart Poettering2016-04-291-3/+17
| | | | | | | | | | | | | | This adds two new settings TriggerLimitIntervalSec= and TriggerLimitBurst= that define a rate limit for activation of socket units. When the limit is hit, the socket is is put into a failure mode. This is an alternative fix for #2467, since the original fix resulted in issue #2684. In a later commit the StartLimitInterval=/StartLimitBurst= rate limiter will be changed to be applied after any start conditions checks are made. This way, there are two separate rate limiters enforced: one at triggering time, before any jobs are queued with this patch, as well as the start limit that is moved again to be run immediately before the unit is activated. Condition checks are done in between the two, and thus no longer affect the start limit.
* core,systemctl: add bus API to retrieve processes of a unitLennart Poettering2016-04-221-0/+10
| | | | | | | | | | | | | | | This adds a new GetProcesses() bus call to the Unit object which returns an array consisting of all PIDs, their process names, as well as their full cgroup paths. This is then used by "systemctl status" to show the per-unit process tree. This has the benefit that the client-side no longer needs to access the cgroupfs directly to show the process tree of a unit. Instead, it now uses this new API, which means it also works if -H or -M are used correctly, as the information from the specific host is used, and not the one from the local system. Fixes: #2945
* core: remove ManagerRunningAs enumLennart Poettering2016-04-121-1/+1
| | | | | | | | | | | Previously, we had two enums ManagerRunningAs and UnitFileScope, that were mostly identical and converted from one to the other all the time. The latter had one more value UNIT_FILE_GLOBAL however. Let's simplify things, and remove ManagerRunningAs and replace it by UnitFileScope everywhere, thus making the translation unnecessary. Introduce two new macros MANAGER_IS_SYSTEM() and MANAGER_IS_USER() to simplify checking if we are running in one or the user context.
* core: Fix path for opening ffs endpoint ep0Georgia Brikis2016-03-231-2/+6
| | | | | | | usbffs_address_create() expects an absolute path to the file that is supposed to be opened. The path specified only leads to the directory containing the endpoint ep0 not the endpoint itself. This commit adds the endpoints name to the path.
* tree-wide: make ++/-- usage consistent WRT spacingVito Caputo2016-02-221-2/+2
| | | | | | Throughout the tree there's spurious use of spaces separating ++ and -- operators from their respective operands. Make ++ and -- operator consistent with the majority of existing uses; discard the spaces.
* Remove kdbus custom endpoint supportDaniel Mack2016-02-111-1/+0
| | | | | | This feature will not be used anytime soon, so remove a bit of cruft. The BusPolicy= config directive will stay around as compat noop.
* Merge pull request #2569 from zonque/removalsMartin Pitt2016-02-101-2/+0
|\ | | | | Remove some old cruft
| * tree-wide: remove Emacs lines from all filesDaniel Mack2016-02-101-2/+0
| | | | | | | | | | This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
* | core: make the StartLimitXYZ= settings generic and apply to any kind of ↵Lennart Poettering2016-02-101-34/+13
|/ | | | | | | | | | | | | | | | | | | | unit, not just services This moves the StartLimitBurst=, StartLimitInterval=, StartLimitAction=, RebootArgument= from the [Service] section into the [Unit] section of unit files, and thus support it in all unit types, not just in services. This way we can enforce the start limit much earlier, in particular before testing the unit conditions, so that repeated start-up failure due to failed conditions is also considered for the start limit logic. For compatibility the four options may also be configured in the [Service] section still, but we only document them in their new section [Unit]. This also renamed the socket unit failure code "service-failed-permanent" into "service-start-limit-hit" to express more clearly what it is about, after all it's only triggered through the start limit being hit. Finally, the code in busname_trigger_notify() and socket_trigger_notify() is altered to become more alike. Fixes: #2467
* core: rework job_get_timeout() to use usec_t and handle USEC_INFINITY time ↵Lennart Poettering2016-02-041-2/+6
| | | | events correctly
* core: rework unit timeout handling, and add new setting RuntimeMaxSec=Lennart Poettering2016-02-011-21/+16
| | | | | | | | | | | | | | | | | | | | | | This clean-ups timeout handling in PID 1. Specifically, instead of storing 0 in internal timeout variables as indication for a disabled timeout, use USEC_INFINITY which is in-line with how we do this in the rest of our code (following the logic that 0 means "no", and USEC_INFINITY means "never"). This also replace all usec_t additions with invocations to usec_add(), so that USEC_INFINITY is properly propagated, and sd-event considers it has indication for turning off the event source. This also alters the deserialization of the units to restart timeouts from the time they were originally started from. Before this patch timeouts would be restarted beginning with the time of the deserialization, which could lead to artificially prolonged timeouts if a daemon reload took place. Finally, a new RuntimeMaxSec= setting is introduced for service units, that specifies a maximum runtime after which a specific service is forcibly terminated. This is useful to put time limits on time-intensive processing jobs. This also simplifies the various xyz_spawn() calls of the various types in that explicit distruction of the timers is removed, as that is done anyway by the state change handlers, and a state change is always done when the xyz_spawn() calls fail. Fixes: #2249
* core: socket options fix SCTP_NODELAYSusant Sahani2015-12-311-3/+9
| | | | | SCTP_NODELAY is diffrent to TCP_NODELAY. Apply proper options in case of SCTP.
* socket: nullify pointers after freeDaniel Mack2015-12-221-5/+5
| | | | | A socket shouldn't be used after socket_done() returns, but follow the general guideline here and avoid dangling pointers anyway.
* socket: free fdname memberDaniel Mack2015-12-221-0/+2
| | | | Plug a small memory leak.
* tree-wide: expose "p"-suffix unref calls in public APIs to make gcc cleanup easyLennart Poettering2015-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GLIB has recently started to officially support the gcc cleanup attribute in its public API, hence let's do the same for our APIs. With this patch we'll define an xyz_unrefp() call for each public xyz_unref() call, to make it easy to use inside a __attribute__((cleanup())) expression. Then, all code is ported over to make use of this. The new calls are also documented in the man pages, with examples how to use them (well, I only added docs where the _unref() call itself already had docs, and the examples, only cover sd_bus_unrefp() and sd_event_unrefp()). This also renames sd_lldp_free() to sd_lldp_unref(), since that's how we tend to call our destructors these days. Note that this defines no public macro that wraps gcc's attribute and makes it easier to use. While I think it's our duty in the library to make our stuff easy to use, I figure it's not our duty to make gcc's own features easy to use on its own. Most likely, client code which wants to make use of this should define its own: #define _cleanup_(function) __attribute__((cleanup(function))) Or similar, to make the gcc feature easier to use. Making this logic public has the benefit that we can remove three header files whose only purpose was to define these functions internally. See #2008.
* core: Do not bind a mount unit to a device, if it was from mountinfoHarald Hoyer2015-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a mount unit is bound to a device, systemd tries to umount the mount point, if it thinks the device has gone away. Due to the uevent queue and inotify of /proc/self/mountinfo being two different sources, systemd can never get the ordering reliably correct. It can happen, that in the uevent queue ADD,REMOVE,ADD is queued and an inotify of mountinfo (or libmount event) happend with the device in question. systemd cannot know, at which point of time the mount happend in the ADD,REMOVE,ADD sequence. The real ordering might have been ADD,REMOVE,ADD,mount and systemd might think ADD,mount,REMOVE,ADD and would umount the mountpoint. A test script which triggered this behaviour is: rm -f test-efi-disk.img dd if=/dev/null of=test-efi-disk.img bs=1M seek=512 count=1 parted --script test-efi-disk.img \ "mklabel gpt" \ "mkpart ESP fat32 1MiB 511MiB" \ "set 1 boot on" LOOP=$(losetup --show -f -P test-efi-disk.img) udevadm settle mkfs.vfat -F32 ${LOOP}p1 mkdir -p mnt mount ${LOOP}p1 mnt ... <dostuffwith mnt> Without the "udevadm settle" systemd unmounted mnt while the script was operating on mnt. Of course the question is, why there was a REMOVE in the first place, but this is not part of this patch.
* socket: Add support for socket protcolSusant Sahani2015-11-181-0/+13
| | | | | | | | | | | | | | | Now we don't support the socket protocol like sctp and udplite . This patch add a new config param SocketProtocol: udplite/sctp With this now we can configure the protocol as udplite = IPPROTO_UDPLITE sctp = IPPROTO_SCTP Tested with nspawn:
* core: drop "override" flag when building transactionsLennart Poettering2015-11-121-2/+2
| | | | | | | | | Now that we don't have RequiresOverridable= and RequisiteOverridable= dependencies anymore, we can get rid of tracking the "override" boolean for jobs in the job engine, as it serves no purpose anymore. While we are at it, fix some error messages we print when invoking functions that take the override parameter.
* core: simplify things a bit by checking default_dependencies boolean in ↵Lennart Poettering2015-11-111-5/+6
| | | | | | | | callee, not caller It's nicer to hide the check away in the various xyz_add_default_dependencies() calls, rather than making it explicit in the caller, and thus require deeper nesing.
* core: change type of distribute_fds() prototype to return voidLennart Poettering2015-11-101-3/+1
| | | | | | We can't handle errors of thisc all sanely anyway, and we never actually return any errors from the unit type that implements the call. Hence, let's make this void, in order to simplify things.
* core: all unit types that watch control PIDs should use the same logicLennart Poettering2015-10-271-4/+3
| | | | | | When coldplugging the unit state, make sure to follow the same basic logic for all unit types: always verify whether the control PID is still a waitable process before proceeding.
* process-util: move a couple of process-related calls overLennart Poettering2015-10-271-0/+1
|