summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* seccomp: add sched_yield syscall to the @default syscall setDjalal Harouni2017-10-041-0/+1
|
* Merge pull request #6946 from poettering/synthesize-dnsZbigniew Jędrzejewski-Szmek2017-10-037-43/+82
|\ | | | | Some DNS RR synthesizing fixes
| * resolved: synthesize records for the full local hostname, tooLennart Poettering2017-09-291-3/+12
| | | | | | | | | | This was forgotten, let's add it too, so that the llmnr, mdns and full hostname RRs are all synthesized if needed.
| * resolved: make sure a non-existing PTR record never gets mangled into NODATALennart Poettering2017-09-291-9/+26
| | | | | | | | | | | | | | | | | | Previously, if a PTR query is seen for a non-existing record, we'd generate an empty response (but not NXDOMAIN or so). Fix that. If we have no data about an IP address, then let's say so, so that the original error is returned, instead of anything synthesized. Fixes: #6543
| * resolved: when there is no gateway, make sure _gateway results in NXDOMAINLennart Poettering2017-09-292-11/+34
| | | | | | | | | | | | Let's ensure that "no gateway" translates to "no domain", instead of an empty reply. This is in line with what nss-myhostname does in the same case, hence let's unify behaviour here of nss-myhostname and resolved.
| * sd-bus: drop bloom fieldsLennart Poettering2017-09-291-3/+0
| | | | | | | | These fields are unused since kdbus support has been removed.
| * sd-bus: drop match cookie conceptLennart Poettering2017-09-295-15/+6
| | | | | | | | | | | | THe match cookie was used by kdbus to identify matches we install uniquely. But given that kdbus is gone, the cookie serves no process anymore, let's kill it.
| * sd-bus: when showing brief message info show error name in debug out put tooLennart Poettering2017-09-291-2/+4
| | | | | | | | | | | | | | When debug logging is enabled we show brief information about every bus message we send or receieve. Pretty much all information is shown, except for the error name if a message is an error (interestingly we do print the error text however). Fix that, and add the error name as well.
* | seccomp: remove '@credentials' syscall set (#6958)Djalal Harouni2017-10-033-30/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes the '@credentials' syscall set that was added in commit v234-468-gcd0ddf6f75. Most of these syscalls are so simple that we do not want to filter them. They work on the current calling process, doing only read operations, they do not have a deep kernel path. The problem may only be in 'capget' syscall since it can query arbitrary processes, and used to discover processes, however sending signal 0 to arbitrary processes can be used to discover if a process exists or not. It is unfortunate that Linux allows to query processes of different users. Lets put it now in '@process' syscall set, and later we may add it to a new '@basic-process' set that allows most basic process operations.
* | Merge pull request #6940 from poettering/magic-dirsYu Watanabe2017-10-0320-98/+746
|\ \ | | | | | | make sure StateDirectory= and friends play nicely with DynamicUser= and RootImage=/RootDirectory=
| * | core: fix special directories for user servicesLennart Poettering2017-10-021-3/+3
| | | | | | | | | | | | | | | The system paths were listed where the user paths should have been listed. Correct that.
| * | path-util: some updates to path_make_relative()Lennart Poettering2017-10-022-8/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't miscount number of "../" to generate, if we "." is included in an input path. Also, refuse if we encounter "../" since we can't possibly follow that up properly, without file system access. Some other modernizations.
| * | core: fix StateDirectory= (and friends) safety checks when decoding ↵Lennart Poettering2017-10-023-7/+7
| | | | | | | | | | | | | | | | | | | | | transient unit properties Let's make sure relative directories such as "foo/bar" are accepted, by using the same validation checks as in unit file parsing.
| * | test: add test for DynamicUser= + StateDirectory=Lennart Poettering2017-10-021-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, tests for DynamicUser= should really run for system mode, as we allocate from a system resource. (This also increases the test timeout to 2min. If one of our tests really hangs then waiting for 2min longer doesn't hurt either. The old 2s is really short, given that we run in potentially slow VM environments for this test. This becomes noticable when the slow "find" command this adds is triggered)
| * | core: pass the correct error to the callerLennart Poettering2017-10-021-1/+2
| | |
| * | core: when looking for a UID to use for a dynamic UID start with the current ↵Lennart Poettering2017-10-023-20/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | owner of the StateDirectory= and friends Let's optimize dynamic UID allocation a bit: if a StateDirectory= (or suchlike) is configured, we start our allocation loop from that UID and use it if it currently isn't used otherwise. This is beneficial as it saves us from having to expensively recursively chown() these directories in the typical case (which StateDirectory= does when it notices that the owner of the directory doesn't match the UID picked). With this in place we now have the a three-phase logic for allocating a dynamic UID: a) first, we try to use the owning UID of StateDirectory=, CacheDirectory=, LogDirectory= if that exists and is currently otherwise unused. b) if that didn't work out, we hash the UID from the service name c) if that didn't yield an unused UID either, randomly pick new ones until we find a free one.
| * | execute: make StateDirectory= and friends compatible with DynamicUser=1 and ↵Lennart Poettering2017-10-024-11/+297
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RootDirectory=/RootImage= Let's clean up the interaction of StateDirectory= (and friends) to DynamicUser=1: instead of creating these directories directly below /var/lib, place them in /var/lib/private instead if DynamicUser=1 is set, making that directory 0700 and owned by root:root. This way, if a dynamic UID is later reused, access to the old run's state directory is prohibited for that user. Then, use file system namespacing inside the service to make /var/lib/private a readable tmpfs, hiding all state directories that are not listed in StateDirectory=, and making access to the actual state directory possible. Mount all directories listed in StateDirectory= to the same places inside the service (which means they'll now be mounted into the tmpfs instance). Finally, add a symlink from the state directory name in /var/lib/ to the one in /var/lib/private, so that both the host and the service can access the path under the same location. Here's an example: let's say a service runs with StateDirectory=foo. When DynamicUser=0 is set, it will get the following setup, and no difference between what the unit and what the host sees: /var/lib/foo (created as directory) Now, if DynamicUser=1 is set, we'll instead get this on the host: /var/lib/private (created as directory with mode 0700, root:root) /var/lib/private/foo (created as directory) /var/lib/foo → private/foo (created as symlink) And from inside the unit: /var/lib/private (a tmpfs mount with mode 0755, root:root) /var/lib/private/foo (bind mounted from the host) /var/lib/foo → private/foo (the same symlink as above) This takes inspiration from how container trees are protected below /var/lib/machines: they generally reuse UIDs/GIDs of the host, but because /var/lib/machines itself is set to 0700 host users cannot access files in the container tree even if the UIDs/GIDs are reused. However, for this commit we add one further trick: inside and outside of the unit /var/lib/private is a different thing: outside it is a plain, inaccessible directory, and inside it is a world-readable tmpfs mount with only the whitelisted subdirs below it, bind mounte din. This means, from the outside the dir acts as an access barrier, but from the inside it does not. And the symlink created in /var/lib/foo itself points across the barrier in both cases, so that root and the unit's user always have access to these dirs without knowing the details of this mounting magic. This logic resolves a major shortcoming of DynamicUser=1 units: previously they couldn't safely store persistant data. With this change they can have their own private state, log and data directories, which they can write to, but which are protected from UID recycling. With this change, if RootDirectory= or RootImage= are used it is ensured that the specified state/log/cache directories are always mounted in from the host. This change of semantics I think is much preferable since this means the root directory/image logic can be used easily for read-only resource bundling (as all writable data resides outside of the image). Note that this is a change of behaviour, but given that we haven't released any systemd version with StateDirectory= and friends implemented this should be a safe change to make (in particular as previously it wasn't clear what would actually happen when used in combination). Moreover, by making this change we can later add a "+" modifier to these setings too working similar to the same modifier in ReadOnlyPaths= and friends, making specified paths relative to the container itself.
| * | namespace: if we can create the destination of bind and PrivateTmp= mountsLennart Poettering2017-10-021-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | When putting together the namespace, always create the file or directory we are supposed to bind mount on, the same way we do it for most other stuff, for example mount units or systemd-nspawn's --bind= option. This has the big benefit that we can use namespace bind mounts on dirs in /tmp or /var/tmp even in conjunction with PrivateTmp=.
| * | namespace: properly handle bind mounts from the hostLennart Poettering2017-10-021-22/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch we had an ordering problem: if we have no namespacing enabled except for two bind mounts that intend to swap /a and /b via bind mounts, then we'd execute the bind mount binding /b to /a, followed by thebind mount from /a to /b, thus having the effect that /b is now visible in both /a and /b, which was not intended. With this change, as soon as any bind mount is configured we'll put together the service mount namespace in a temporary directory instead of operating directly in the root. This solves the problem in a straightforward fashion: the source of bind mounts will always refer to the host, and thus be unaffected from the bind mounts we already created.
| * | namespace: create /dev, /proc, /sys when neededLennart Poettering2017-10-021-0/+6
| | | | | | | | | | | | | | | | | | | | | We already create /dev implicitly if PrivateTmp=yes is on, if it is missing. Do so too for the other two API VFS, as well as for /dev if PrivateTmp=yes is off but MountAPIVFS=yes is on (i.e. when /dev is bind mounted from the host).
| * | core: usually our enum's _INVALID and _MAX special values are named after ↵Lennart Poettering2017-10-026-21/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the full type In most cases we followed the rule that the special _INVALID and _MAX values we use in our enums use the full type name as prefix (in contrast to regular values that we often make shorter), do so for ExecDirectoryType as well. No functional changes, just a little bit of renaming to make this code more like the rest.
| * | core: chown() StateDirectory= and friends recursively when starting a serviceLennart Poettering2017-10-024-4/+190
| | | | | | | | | | | | | | | | | | | | | This is particularly useful when used in conjunction with DynamicUser=1, where the UID might change for every invocation, but is useful in other cases too, for example, when these directories are shared between systems where the UID assignments differ slightly.
| * | nspawn: properly report all kinds of changed UID/GID when patching things ↵Lennart Poettering2017-10-021-0/+2
| | | | | | | | | | | | | | | | | | for userns We forgot to propagate one chmod().
* | | Merge pull request #6943 from poettering/dissect-roZbigniew Jędrzejewski-Szmek2017-10-023-1/+32
|\ \ \ | |/ / |/| | Automatically recognize that "squashfs" and "iso9660" area always read-only
| * | mount-util: add fusectl to list of API VFSLennart Poettering2017-09-291-0/+1
| | |
| * | dissect: split list of discard-supporting fs out into mount-util.cLennart Poettering2017-09-293-1/+14
| | | | | | | | | | | | | | | | | | | | | Let's manage the list of file systems that do a specific thing at one place, following similar naming. No functional changes.
| * | dissect: automatically mark partitions read-only that have a read-only file ↵Lennart Poettering2017-09-293-0/+17
| |/ | | | | | | | | | | | | system Specifically, squashfs and iso9660 are always read-only, hence make sure we never even think about mounting them writable.
* | service: better detect when a Type=notify service cannot become active ↵Jouke Witteveen2017-10-021-2/+2
| | | | | | | | | | | | | | anymore (#6959) No need to wait for a timeout when we know things are not going to work out. When the main process goes away and only notifications from the main process are accepted, then we will not receive any notifications anymore.
* | Merge pull request #6941 from andir/use-in_setZbigniew Jędrzejewski-Szmek2017-10-0271-191/+152
|\ \ | | | | | | use IN_SET where possible
| * | Minor line wrapping adjustmentZbigniew Jędrzejewski-Szmek2017-10-022-3/+12
| | |
| * | tree-wide: use `!IN_SET(..)` for `a != b && a != c && …`Andreas Rammhold2017-10-0223-60/+38
| | | | | | | | | | | | | | | | | | The included cocci was used to generate the changes. Thanks to @flo-wer for pointing this case out.
| * | tree-wide: use IN_SET where possibleAndreas Rammhold2017-10-0257-131/+105
| |/ | | | | | | | | In addition to the changes from #6933 this handles cases that could be matched with the included cocci file.
* | service: accept the fact that the three xyz_good() functions return intsLennart Poettering2017-10-021-7/+7
| | | | | | | | | | | | | | | | | | Currently, all three of cgroup_good(), main_pid_good(), control_pid_good() all return an "int" (two of them propagate errors). It's a good thing to keep the three functions similar, so let's leave it at that, but then let's clean up the invocation of the three functions so that they always clearly acknowledge that the return value is not a bool, but potentially negative.
* | service: drop _pure_ decorator on static functionLennart Poettering2017-10-021-1/+1
| | | | | | | | | | | | | | The compiler should be good enough to figure this out on its own if this is a static function, and it makes control_pid_good() an outlier anyway, and decorators like this tend to bitrot. Hence, to keep things simple and automatic, let's just drop the decorator.
* | service: a cgroup empty notification isn't reason enough to go downLennart Poettering2017-10-021-2/+7
| | | | | | | | | | | | | | | | | | | | | | The processes associated with a service are not just the ones in its cgroup, but also the control and main processes, which might possibly live outside of it, for example if they transitioned into their own cgroups because they registered a PAM session of their own. Hence, if we get a cgroup empty notification always check if the main PID is still around before taking action too eagerly. Fixes: #6045
* | service: add explanatory comments to control_pid_good() and cgroup_good()Lennart Poettering2017-10-021-0/+7
| | | | | | | | | | | | Let's add a similar comment to each as we already have for main_pid_good(), emphasizing that these functions are supposed to be have very similar.
* | service: fix main_pid_good() commentLennart Poettering2017-10-021-2/+1
|/ | | | We don't actually return -1, don't claim that.
* meson: move library version defines to the top (#6939)Zbigniew Jędrzejewski-Szmek2017-09-281-1/+1
|
* Merge pull request #6933 from yuwata/use_in_setLennart Poettering2017-09-2819-92/+74
|\ | | | | use IN_SET macro
| * libsystemd: use IN_SET macroYu Watanabe2017-09-2818-87/+69
| |
| * networkd: use assert_not_reached()Yu Watanabe2017-09-281-2/+2
| |
| * networkd: use IN_SET macroYu Watanabe2017-09-281-3/+3
| |
* | Merge pull request #6924 from andir/vrf-dhcpv4Lennart Poettering2017-09-284-5/+17
|\ \ | | | | | | networkd: use VRFs routing table for DHCP routes
| * | networkd: use VRFs routing table for DHCP routesAndreas Rammhold2017-09-274-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an interface has been enslaved to a VRF the received routes should be added to the VRFs RT instead of the main table. This change modifies the default behaviour of routes in the case where a network belongs to an VRF. When the user does not configure a `DHCP.RouteTable` in a `systemd.network` file and the interface belongs to a VRF, the VRFs routing table is used instead of RT_TABLE_MAIN. When the user has configured a custom routing table for DHCP the VRFs table is ignored and the users preference takes precedence.
* | | udev-rules: all values can contain escaped double quotes now (#6890)Franck Bui2017-09-282-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is primarly useful to support escaped double quotes in PROGRAM or IMPORT{program} directives. The only possibilty before this patch was to use an external shell script but this seems too cumbersome for trivial logics such as PROGRAM=="/bin/sh -c 'FOO=\"%s{model}\"; echo ${FOO:0:4}'" or any similar shell constructs that needs to deals with patterns including whitespaces. As it's the case for single quote and for directives running a program, words within escaped double quotes will be considered as a single argument. Fixes: #6835
* | | Merge pull request #6928 from poettering/cgroup-empty-raceZbigniew Jędrzejewski-Szmek2017-09-2814-65/+168
|\ \ \ | |_|/ |/| | rework cgroup empty notification handling (i.e. a fix for #6608)
| * | core: log unit failure with type-specific result codeLennart Poettering2017-09-279-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This slightly changes how we log about failures. Previously, service_enter_dead() would log that a service unit failed along with its result code, and unit_notify() would do this again but without the result code. For other unit types only the latter would take effect. This cleans this up: we keep the message in unit_notify() only for debug purposes, and add type-specific log lines to all our unit types that can fail, and always place them before unit_notify() is invoked. Or in other words: the duplicate log message for service units is removed, and all other unit types get a more useful line with the precise result code.
| * | core: free_and_strdup() FTW!Lennart Poettering2017-09-271-12/+6
| | |
| * | cgroup: IN_SET() FTW!Lennart Poettering2017-09-271-1/+1
| | |
| * | cgroup: after determining that a cgroup is empty, asynchronously dispatch thisLennart Poettering2017-09-277-24/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes sure that if we learn via inotify or another event source that a cgroup is empty, and we checked that this is indeed the case (as we might get spurious notifications through inotify, as the inotify logic through the "cgroups.event" is pretty unspecific and might be trigger for a variety of reasons), then we'll enqueue a defer event for it, at a priority lower than SIGCHLD handling, so that we know for sure that if there's waitid() data for a process we used it before considering the cgroup empty notification. Fixes: #6608