summaryrefslogtreecommitdiff
path: root/src/nspawn/nspawn-mount.h
Commit message (Collapse)AuthorAgeFilesLines
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-4/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* tree-wide: unify how we define bit mak enumsLennart Poettering2018-06-121-6/+6
| | | | | | Let's always write "1 << 0", "1 << 1" and so on, except where we need more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force 64bit values.
* nspawn: lock down a few things in /proc by defaultLennart Poettering2018-05-031-6/+7
| | | | | | | | | | | This tightens security on /proc: a couple of files exposed there are now made inaccessible. These files might potentially leak kernel internals or expose non-virtualized concepts, hence lock them down by default. Moreover, a couple of dirs in /proc that expose stuff also exposed in /sys are now marked read-only, similar to how we handle /sys. The list is taken from what docker/runc based container managers generally apply, but slightly extended.
* nspawn: size_t more stuffLennart Poettering2018-05-031-7/+7
| | | | A follow-up for #8840
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* Add SPDX license identifiers to source files under the LGPLZbigniew Jędrzejewski-Szmek2017-11-191-0/+1
| | | | | This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
* nspawn: Add support for sysroot pivoting (#5258)Philip Withnall2017-02-081-0/+3
| | | | | | | | | Add a new --pivot-root argument to systemd-nspawn, which specifies a directory to pivot to / inside the container; while the original / is pivoted to another specified directory (if provided). This adds support for booting container images which may contain several bootable sysroots, as is common with OSTree disk images. When these disk images are booted on real hardware, ostree-prepare-root is run in conjunction with sysroot.mount in the initramfs to achieve the same results.
* nspawn: split out VolatileMode definitionsLennart Poettering2016-12-201-10/+1
| | | | | This moves the VolatileMode enum and its helper functions to src/shared/. This is useful to then reuse them to implement systemd.volatile= in a later commit.
* nspawn: optionally, automatically allocated --bind=/--overlay source from ↵Lennart Poettering2016-12-011-0/+1
| | | | | | | | | | | /var/tmp This extends the --bind= and --overlay= syntax so that an empty string as source/upper directory is taken as request to automatically allocate a temporary directory below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination with the "+" path extension this permits a switch "--overlay=+/var::/var" in order to use the container's shipped /var, combine it with a writable temporary directory and mount it to the runtime /var of the container.
* nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"Lennart Poettering2016-12-011-3/+1
| | | | | | | | | | | | | If a source path is prefixed with "+" it is taken relative to the container's root directory instead of the host. This permits easily establishing bind and overlay mounts based on data from the container rather than the host. This also reworks custom_mounts_prepare(), and turns it into two functions: one custom_mount_check_all() that remains in nspawn.c but purely verifies the validity of the custom mounts configured. And one called custom_mount_prepare_all() that actually does the preparation step, sorts the custom mounts, resolves relative paths, and allocates temporary directories as necessary.
* nspawn: split out overlayfs argument parsing into a function of its ownLennart Poettering2016-12-011-0/+2
| | | | | Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().
* nspawn: R/W support for /sys, and /proc/sysSergiusz Urbaniak2016-11-181-2/+11
| | | | | | | | | | | | | | | | | | | | This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.
* core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo2016-08-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
* nspawn: add SYSTEMD_NSPAWN_USE_CGNS env variable (#3809)Christian Brauner2016-07-261-1/+1
| | | SYSTEMD_NSPAWN_USE_CGNS allows to disable the use of cgroup namespaces.
* tree-wide: remove Emacs lines from all filesDaniel Mack2016-02-101-2/+0
| | | | | This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
* nspawn: skip /sys-as-tmpfs if we don't use private-networkIago López Galeiras2015-10-201-1/+1
| | | | | | | | | | | | | | | | | | Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over the network namespace. So the mounting /sys as a tmpfs code introduced in d8fc6a000fe21b0c1ba27fbfed8b42d00b349a4b doesn't work with user namespaces if we don't use private-net. The reason is that we mount sysfs inside the container and we're in the network namespace of the host but we don't have CAP_SYS_ADMIN over that namespace. To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use private network and ignore the /sys-as-a-tmpfs code if we find that /sys is already mounted as sysfs. Fixes #1555
* nspawn: mount /sys as tmpfs, and then mount only select subdirs of the real ↵Lennart Poettering2015-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | sysfs below it This way we can hide things like /sys/firmware or /sys/hypervisor from the container, while keeping the device tree around. While this is a security benefit in itself it also allows us to fix issue #1277. Previously we'd mount /sys before creating the user namespace, in order to be able to mount /sys/fs/cgroup/* beneath it (which resides in it), which we can only mount outside of the user namespace. To ensure that the user namespace owns the network namespace we'd set up the network namespace at the same time as the user namespace. Thus, we'd still see the /sys/class/net/ from the originating network namespace, even though we are in our own network namespace now. With this patch, /sys is mounted before transitioning into the user namespace as tmpfs, so that we can also mount /sys/fs/cgroup/* into it this early. The directories such as /sys/class/ are then later added in from the real sysfs from inside the network and user namespace so that they actually show whatis available in it. Fixes #1277
* nspawn: fix user namespace supportLennart Poettering2015-09-301-1/+1
| | | | | We didn#t actually pass ownership of /run to the UID in the container since some releases, let's fix that.
* nspawn: split out mount related functions into a new nspawn-mount.c fileLennart Poettering2015-09-071-0/+70