summaryrefslogtreecommitdiff
path: root/src/nspawn/nspawn-mount.h
Commit message (Collapse)AuthorAgeFilesLines
* tree-wide: use -EINVAL for enum invalid valuesZbigniew Jędrzejewski-Szmek2021-02-101-1/+1
| | | | | | | | | As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617. This does not touch anything exposed in src/systemd. Changing the defines there would be a compatibility break. Note that tests are broken after this commit. They will be fixed in the next one.
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* mount-util: switch most mount_verbose() code over to not follow symlinksLennart Poettering2020-09-231-0/+1
|
* nspawn: rework how /run/host/ is set upLennart Poettering2020-07-231-0/+1
| | | | | | | | | | | Let's find the right os-release file on the host side, and only mount the one that matters, i.e. /etc/os-release if it exists and /usr/lib/os-release otherwise. Use the fixed path /run/host/os-release for that. Let's also mount /run/host as a bind mount on itself before we set up /run/host, and let's mount it MS_RDONLY after we are done, so that it remains immutable as a whole.
* nspawn: implement container host os-release interfaceLuca Boccassi2020-06-231-0/+1
|
* nspawn: be more careful with creating/chowning directories to overmountLennart Poettering2020-04-281-0/+1
| | | | | | We should never re-chown selinuxfs. Fixes: #15475
* nspawn: Don't mount read-only if we have a custom mount on root.Daan De Meyer2020-01-031-0/+1
|
* nspawn-mount: Remove unused parametersDaan De Meyer2019-12-121-2/+2
|
* nspawn: Enable specifying root as the mount target directory.Daan De Meyer2019-12-121-1/+3
| | | | Fixes #3847.
* nspawn: add support for executing OCI runtime bundles with nspawnLennart Poettering2019-03-151-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pretty large patch, and adds support for OCI runtime bundles to nspawn. A new switch --oci-bundle= is added that takes a path to an OCI bundle. The JSON file included therein is read similar to a .nspawn settings files, however with a different feature set. Implementation-wise this mostly extends the pre-existing Settings object to carry additional properties for OCI. However, OCI supports some concepts .nspawn files did not support yet, which this patch also adds: 1. Support for "masking" files and directories. This functionatly is now also available via the new --inaccesible= cmdline command, and Inaccessible= in .nspawn files. 2. Support for mounting arbitrary file systems. (not exposed through nspawn cmdline nor .nspawn files, because probably not a good idea) 3. Ability to configure the console settings for a container. This functionality is now also available on the nspawn cmdline in the new --console= switch (not added to .nspawn for now, as it is something specific to the invocation really, not a property of the container) 4. Console width/height configuration. Not exposed through .nspawn/cmdline, but this may be controlled through $COLUMNS and $LINES like in most other UNIX tools. 5. UID/GID configuration by raw numbers. (not exposed in .nspawn and on the cmdline, since containers likely have different user tables, and the existing --user= switch appears to be the better option) 6. OCI hook commands (no exposed in .nspawn/cmdline, as very specific to OCI) 7. Creation of additional devices nodes in /dev. Most likely not a good idea, hence not exposed in .nspawn/cmdline. There's already --bind= to achieve the same, which is the better alternative. 8. Explicit syscall filters. This is not a good idea, due to the skewed arch support, hence not exposed through .nspawn/cmdline. 9. Configuration of some sysctls on a whitelist. Questionnable, not supported in .nspawn/cmdline for now. 10. Configuration of all 5 types of capabilities. Not a useful concept, since the kernel will reduce the caps on execve() anyway. Not exposed through .nspawn/cmdline as this is not very useful hence. Note that this only implements the OCI runtime logic itself. It does not provide a runc-compatible command line tool. This is left for a later PR. Only with that in place tools such as "buildah" can use the OCI support in nspawn as drop-in replacement. Currently still missing is OCI hook support, but it's already parsed and everything, and should be easy to add. Other than that it's OCI is implemented pretty comprehensively. There's a list of incompatibilities in the nspawn-oci.c file. In a later PR I'd like to convert this into proper markdown and add it to the documentation directory.
* nspawn: add volatile mode multiplexer call setup_volatile_mode()Lennart Poettering2019-03-011-2/+1
| | | | Just some refactoring, no change in behaviour.
* nspawn: optionally don't mount a tmpfs over /tmp (#10294)Lennart Poettering2018-10-081-0/+1
| | | | | | nspawn: optionally, don't mount a tmpfs on /tmp Fixes: #10260
* nspawn: Move cgroup mount stuff from nspawn-mount.c to nspawn-cgroup.cLuke Shumaker2018-07-201-3/+2
|
* nspawn: Simplify tmpfs_patch_options() usage, and trickle that upLuke Shumaker2018-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | One of the things that tmpfs_patch_options does is take an (optional) UID, and insert "uid=${UID},gid=${UID}" into the options string. So we need a uid_t argument, and a way of telling if we should use it. Fortunately, that is built in to the uid_t type by having UID_INVALID as a possible value. So this is really a feature that requires one argument. Yet, it is somehow taking 4! That is absurd. Simplify it to only take one argument, and have that trickle all the way up to mount_all()'s usage. Now, in may of the uses, the argument becomes uid_shift == 0 ? UID_INVALID : uid_shift because it used to treat uid_shift=0 as invalid unless the patch_ids flag was also set. This keeps the behavior the same. Note that in all cases where it is invoked, if !use_userns (sometimes called !userns), then uid_shift is 0; we don't have to add any checks for that. That said, I'm pretty sure that "uid=0" and not setting "uid=" are the same, but Christian Brauner seemed to not think so when implementing the cgns support. https://github.com/systemd/systemd/pull/3589
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-4/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* tree-wide: unify how we define bit mak enumsLennart Poettering2018-06-121-6/+6
| | | | | | Let's always write "1 << 0", "1 << 1" and so on, except where we need more than 31 flag bits, where we write "UINT64(1) << 0", and so on to force 64bit values.
* nspawn: lock down a few things in /proc by defaultLennart Poettering2018-05-031-6/+7
| | | | | | | | | | | This tightens security on /proc: a couple of files exposed there are now made inaccessible. These files might potentially leak kernel internals or expose non-virtualized concepts, hence lock them down by default. Moreover, a couple of dirs in /proc that expose stuff also exposed in /sys are now marked read-only, similar to how we handle /sys. The list is taken from what docker/runc based container managers generally apply, but slightly extended.
* nspawn: size_t more stuffLennart Poettering2018-05-031-7/+7
| | | | A follow-up for #8840
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* Add SPDX license identifiers to source files under the LGPLZbigniew Jędrzejewski-Szmek2017-11-191-0/+1
| | | | | This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
* nspawn: Add support for sysroot pivoting (#5258)Philip Withnall2017-02-081-0/+3
| | | | | | | | | Add a new --pivot-root argument to systemd-nspawn, which specifies a directory to pivot to / inside the container; while the original / is pivoted to another specified directory (if provided). This adds support for booting container images which may contain several bootable sysroots, as is common with OSTree disk images. When these disk images are booted on real hardware, ostree-prepare-root is run in conjunction with sysroot.mount in the initramfs to achieve the same results.
* nspawn: split out VolatileMode definitionsLennart Poettering2016-12-201-10/+1
| | | | | This moves the VolatileMode enum and its helper functions to src/shared/. This is useful to then reuse them to implement systemd.volatile= in a later commit.
* nspawn: optionally, automatically allocated --bind=/--overlay source from ↵Lennart Poettering2016-12-011-0/+1
| | | | | | | | | | | /var/tmp This extends the --bind= and --overlay= syntax so that an empty string as source/upper directory is taken as request to automatically allocate a temporary directory below /var/tmp, whose lifetime is bound to the nspawn runtime. In combination with the "+" path extension this permits a switch "--overlay=+/var::/var" in order to use the container's shipped /var, combine it with a writable temporary directory and mount it to the runtime /var of the container.
* nspawn: permit prefixing of source paths in --bind= and --overlay= with "+"Lennart Poettering2016-12-011-3/+1
| | | | | | | | | | | | | If a source path is prefixed with "+" it is taken relative to the container's root directory instead of the host. This permits easily establishing bind and overlay mounts based on data from the container rather than the host. This also reworks custom_mounts_prepare(), and turns it into two functions: one custom_mount_check_all() that remains in nspawn.c but purely verifies the validity of the custom mounts configured. And one called custom_mount_prepare_all() that actually does the preparation step, sorts the custom mounts, resolves relative paths, and allocates temporary directories as necessary.
* nspawn: split out overlayfs argument parsing into a function of its ownLennart Poettering2016-12-011-0/+2
| | | | | Add overlay_mount_parse() similar in style to tmpfs_mount_parse() and bind_mount_parse().
* nspawn: R/W support for /sys, and /proc/sysSergiusz Urbaniak2016-11-181-2/+11
| | | | | | | | | | | | | | | | | | | | This commit adds the possibility to leave /sys, and /proc/sys read-write. It introduces a new (undocumented) env var SYSTEMD_NSPAWN_API_VFS_WRITABLE to enable this feature. If set to "yes", /sys, and /proc/sys will be read-write. If set to "no", /sys, and /proc/sys will be read-only. If set to "network" /proc/sys/net will be read-write. This is useful in use-cases, where systemd-nspawn is used in an external network namespace. This adds the possibility to start privileged containers which need more control over settings in the /proc, and /sys filesystem. This is also a follow-up on the discussion from https://github.com/systemd/systemd/pull/4018#r76971862 where an introduction of a simple env var to enable R/W support for those directories was already discussed.
* core: use the unified hierarchy for the systemd cgroup controller hierarchyTejun Heo2016-08-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, systemd uses either the legacy hierarchies or the unified hierarchy. When the legacy hierarchies are used, systemd uses a named legacy hierarchy mounted on /sys/fs/cgroup/systemd without any kernel controllers for process management. Due to the shortcomings in the legacy hierarchy, this involves a lot of workarounds and complexities. Because the unified hierarchy can be mounted and used in parallel to legacy hierarchies, there's no reason for systemd to use a legacy hierarchy for management even if the kernel resource controllers need to be mounted on legacy hierarchies. It can simply mount the unified hierarchy under /sys/fs/cgroup/systemd and use it without affecting other legacy hierarchies. This disables a significant amount of fragile workaround logics and would allow using features which depend on the unified hierarchy membership such bpf cgroup v2 membership test. In time, this would also allow deleting the said complexities. This patch updates systemd so that it prefers the unified hierarchy for the systemd cgroup controller hierarchy when legacy hierarchies are used for kernel resource controllers. * cg_unified(@controller) is introduced which tests whether the specific controller in on unified hierarchy and used to choose the unified hierarchy code path for process and service management when available. Kernel controller specific operations remain gated by cg_all_unified(). * "systemd.legacy_systemd_cgroup_controller" kernel argument can be used to force the use of legacy hierarchy for systemd cgroup controller. * nspawn: By default nspawn uses the same hierarchies as the host. If UNIFIED_CGROUP_HIERARCHY is set to 1, unified hierarchy is used for all. If 0, legacy for all. * nspawn: arg_unified_cgroup_hierarchy is made an enum and now encodes one of three options - legacy, only systemd controller on unified, and unified. The value is passed into mount setup functions and controls cgroup configuration. * nspawn: Interpretation of SYSTEMD_CGROUP_CONTROLLER to the actual mount option is moved to mount_legacy_cgroup_hierarchy() so that it can take an appropriate action depending on the configuration of the host. v2: - CGroupUnified enum replaces open coded integer values to indicate the cgroup operation mode. - Various style updates. v3: Fixed a bug in detect_unified_cgroup_hierarchy() introduced during v2. v4: Restored legacy container on unified host support and fixed another bug in detect_unified_cgroup_hierarchy().
* nspawn: add SYSTEMD_NSPAWN_USE_CGNS env variable (#3809)Christian Brauner2016-07-261-1/+1
| | | SYSTEMD_NSPAWN_USE_CGNS allows to disable the use of cgroup namespaces.
* tree-wide: remove Emacs lines from all filesDaniel Mack2016-02-101-2/+0
| | | | | This should be handled fine now by .dir-locals.el, so need to carry that stuff in every file.
* nspawn: skip /sys-as-tmpfs if we don't use private-networkIago López Galeiras2015-10-201-1/+1
| | | | | | | | | | | | | | | | | | Since v3.11/7dc5dbc ("sysfs: Restrict mounting sysfs"), the kernel doesn't allow mounting sysfs if you don't have CAP_SYS_ADMIN rights over the network namespace. So the mounting /sys as a tmpfs code introduced in d8fc6a000fe21b0c1ba27fbfed8b42d00b349a4b doesn't work with user namespaces if we don't use private-net. The reason is that we mount sysfs inside the container and we're in the network namespace of the host but we don't have CAP_SYS_ADMIN over that namespace. To fix that, we mount /sys as a sysfs (instead of tmpfs) if we don't use private network and ignore the /sys-as-a-tmpfs code if we find that /sys is already mounted as sysfs. Fixes #1555
* nspawn: mount /sys as tmpfs, and then mount only select subdirs of the real ↵Lennart Poettering2015-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | sysfs below it This way we can hide things like /sys/firmware or /sys/hypervisor from the container, while keeping the device tree around. While this is a security benefit in itself it also allows us to fix issue #1277. Previously we'd mount /sys before creating the user namespace, in order to be able to mount /sys/fs/cgroup/* beneath it (which resides in it), which we can only mount outside of the user namespace. To ensure that the user namespace owns the network namespace we'd set up the network namespace at the same time as the user namespace. Thus, we'd still see the /sys/class/net/ from the originating network namespace, even though we are in our own network namespace now. With this patch, /sys is mounted before transitioning into the user namespace as tmpfs, so that we can also mount /sys/fs/cgroup/* into it this early. The directories such as /sys/class/ are then later added in from the real sysfs from inside the network and user namespace so that they actually show whatis available in it. Fixes #1277
* nspawn: fix user namespace supportLennart Poettering2015-09-301-1/+1
| | | | | We didn#t actually pass ownership of /run to the UID in the container since some releases, let's fix that.
* nspawn: split out mount related functions into a new nspawn-mount.c fileLennart Poettering2015-09-071-0/+70