summaryrefslogtreecommitdiff
path: root/src/nspawn
Commit message (Collapse)AuthorAgeFilesLines
...
* meson: Do not include headers in source listsJan Janssen2023-01-241-16/+0
| | | | | | Meson+ninja+compiler do this for us and are better at it. https://mesonbuild.com/FAQ.html#do-i-need-to-add-my-headers-to-the-sources-list-like-in-autotools
* path-util: rework file_in_same_dir() on top of path_extract_directory()Lennart Poettering2023-01-241-7/+7
| | | | | | | | | | | | | | Let's port one more over. Note that this changes behaviour of file_in_same_dir() in some regards. Specifically, a trailing slash of the input path will be treated differently: previously we'd operate below that dir then, instead of the parent. I think that makes little sense however, and I think the code using this function doesn't expect that either. Moroever, addresses some corner cases if the path is specified as "/" or ".", i.e. where e cannot extract a parent. These will now be treated as error, which I think is much cleaner.
* loop-util: always tell kernel explicitly about loopback sector sizeLennart Poettering2023-01-181-0/+1
| | | | | | Let's not leave the sector size unspecified: either set a user supplied value, or auto-detect the right size by probing the disk image accordingly.
* nspawn: guard acl_free() with a NULL checkLennart Poettering2023-01-061-1/+3
| | | | | | | Inspired by #25957 there's one other place where we don't guard acl_free() calls with a NULL check. Fix that.
* nspawn: port over basename() → path_extract_filename()Lennart Poettering2022-12-232-6/+13
|
* tree-wide: use -EBADF moreYu Watanabe2022-12-212-6/+6
|
* tree-wide: introduce PIPE_EBADF macroYu Watanabe2022-12-201-2/+2
|
* tree-wide: use -EBADF for fd initializationZbigniew Jędrzejewski-Szmek2022-12-195-13/+13
| | | | | | | | | | | | | | | | -1 was used everywhere, but -EBADF or -EBADFD started being used in various places. Let's make things consistent in the new style. Note that there are two candidates: EBADF 9 Bad file descriptor EBADFD 77 File descriptor in bad state Since we're initializating the fd, we're just assigning a value that means "no fd yet", so it's just a bad file descriptor, and the first errno fits better. If instead we had a valid file descriptor that became invalid because of some operation or state change, the other errno would fit better. In some places, initialization is dropped if unnecessary.
* mount-util: make mount_switch_root() take a mount propagation flagYu Watanabe2022-12-151-1/+1
|
* Merge pull request #25723 from keszybz/generators-tmpYu Watanabe2022-12-153-91/+96
|\ | | | | Run generators with / ro and /tmp mounted
| * tree-wide: use mode=0nnn for mount optionZbigniew Jędrzejewski-Szmek2022-12-142-12/+17
| | | | | | | | | | | | This is an octal number. We used the 0 prefix in some places inconsistently. The kernel always interprets in base-8, so this has no effect, but I think it's nicer to use the 0 to remind the reader that this is not a decimal number.
| * nspawn: realign columnsZbigniew Jędrzejewski-Szmek2022-12-131-79/+79
| | | | | | | | Follow-up for b9e7f22c2d80930cad36ae53e66e42a2996dca4a.
* | nspawn: remove cgroup socketChristian Brauner2022-12-131-14/+3
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: remove pty socketChristian Brauner2022-12-131-15/+3
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: remove rtnl socketChristian Brauner2022-12-131-21/+8
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: s/kmsg_socket_pair/fd_inner_socket_pair/gChristian Brauner2022-12-131-19/+27
| | | | | | | | | | | | | | Also stop stashing the kmsg fifo fd in the socket. Just retrieve it in the parent and have the parent hold on to it. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: s/fd_socket_pair/fd_outer_socket_pair/gChristian Brauner2022-12-131-24/+24
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: remove uid socketChristian Brauner2022-12-131-16/+6
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: remove uuid socketChristian Brauner2022-12-131-13/+2
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: remove pid socketChristian Brauner2022-12-131-13/+2
| | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | nspawn: s/notify_socket/fd_socket/gChristian Brauner2022-12-131-6/+6
|/ | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* sd-id128: fold do_sync flag into Id128FormatFlagYu Watanabe2022-12-121-1/+1
|
* sd-id128: make id128_read() or friends return -ENOPKG when the file contents ↵Yu Watanabe2022-12-121-3/+3
| | | | | | | | | is "uninitialized" Then, this drops ID128_PLAIN_OR_UNINIT. Also, this renames Id128Format -> Id128FormatFlag, and make it bitfield. Fixes #25634.
* Merge pull request #25513 from brauner/pivot_root.nspawnLuca Boccassi2022-12-063-31/+141
|\ | | | | nspawn: support pivot_root()
| * nspawn: split mount tunnel setupChristian Brauner2022-12-051-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before we supported pivot_root() nspawn used to make the rootfs shared before setting up the mount tunnel. So it was safe for it to just turn it into a dependent mount during setup. However, we cannot do this anymore because of the requirements pivot_root() has. After the pivot_root() we will make the rootfs shared recursively. If we turned the mount tunnel into dependent mount before mount_switch_root() this will have the consequence that it becomes a shared mount within the same peer group as the rootfs. So no mounts will propagate into the container from the host anymore. To fix this we split setting up the mount tunnel and making it active into two steps. Setting up the mount tunnel is performed before mount_switch_root() and activating it afterwards. Note that this works because turning a shared mount into a shared mount is a nop. IOW, no new peer group will be allocated. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
| * nspawn: mount temporary visible procfs and sysfs instanceChristian Brauner2022-12-053-10/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to mount procfs and sysfs in an unprivileged container the kernel requires that a fully visible instance is already present in the target mount namespace. Mount one here so the inner child can mount its own instances. Later we umount the temporary instances created here before we actually exec the payload. Since the rootfs is shared the umount will propagate into the container. Note, the inner child wouldn't be able to unmount the instances on its own since it doesn't own the originating mount namespace. IOW, the outer child needs to do this. So far nspawn didn't run into this issue because it used MS_MOVE which meant that the shadow mount tree pinned a procfs and sysfs instance which the kernel would find. The shadow mount tree is gone with proper pivot_root() semantics. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
| * nspawn: support pivot_root()Christian Brauner2022-12-051-14/+10
| | | | | | | | | | | | | | | | | | | | In order to support pivot_root() we need to move mount propagation changes after the pivot_root(). While MS_MOVE requires the source mount to not be a shared mount pivot_root() also requires the target mount to not be a shared mount. This guarantees that pivot_root() doesn't leak any mounts. Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | tree-wide: fix typoYu Watanabe2022-12-021-2/+2
| |
* | nspawn: Use "Ctrl-" rather than "^" in info msgPhaedrus Leeds2022-12-021-1/+1
| | | | | | | | | | Maybe most people know that "^]" means "Ctrl + ]" but for those that don't, this should be more clear.
* | dissect: rework DISSECT_IMAGE_ADD_PARTITION_DEVICES + ↵Lennart Poettering2022-12-011-1/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DISSECT_IMAGE_OPEN_PARTITION_DEVICES Curently, these two flags were implied by dissect_loop_device(), but that's not right, because this means systemd-gpt-auto-generator will dissect the root block device with these flags set and that's not desirable: the generator should not cause the partition devices to be created (we don't intend to use them right-away after all, but expect udev to find/probe them first, and then mount them though .mount units). And there's no point in opening the partition devices, since we do not intend to mount them via fds either. Hence, rework this: instead of implying the flags, specify them explicitly. While we are at it, let's also rename the flags to make them more descriptive: DISSECT_IMAGE_MANAGE_PARTITION_DEVICES becomes DISSECT_IMAGE_ADD_PARTITION_DEVICES, since that's really all this does: add the partition devices via BLKPG. DISSECT_IMAGE_OPEN_PARTITION_DEVICES becomes DISSECT_IMAGE_PIN_PARTITION_DEVICES, since we not only open the devices, but keep the devices open continously (i.e. we "pin" them). Also, drop the DISSECT_IMAGE_BLOCK_DEVICE combination flag, since it is misleading, i.e. it suggests it was appropriate to specify on all dissected blocking devices, but that's precisely not the case, see the systemd-gpt-auto-generator case. My guess is that the confusion around this was actually the cause for this bug we are addressing here. Fixes: #25528
* Merge pull request #25379 from keszybz/update-doc-linksLuca Boccassi2022-11-221-1/+1
|\ | | | | Update doc links
| * tree-wide: BLS and DPS are now on uapi-group websiteZbigniew Jędrzejewski-Szmek2022-11-211-1/+1
| |
* | nspawn: allow sched_rr_get_interval_time64 through seccomp filterSam James2022-11-181-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only allow a selected subset of syscalls from nspawn containers and don't list any time64 variants (needed for 32-bit arches when built using TIME_BITS=64, which is relatively new). We allow sched_rr_get_interval which cpython's test suite makes use of, but we don't allow sched_rr_get_interval_time64. The test failures when run in an arm32 nspawn container on an arm64 host were as follows: ``` ====================================================================== ERROR: test_sched_rr_get_interval (test.test_posix.PosixTester.test_sched_rr_get_interval) ---------------------------------------------------------------------- Traceback (most recent call last): File "/var/tmp/portage/dev-lang/python-3.11.0_p1/work/Python-3.11.0/Lib/test/test_posix.py", line 1180, in test_sched_rr_get_interval interval = posix.sched_rr_get_interval(0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PermissionError: [Errno 1] Operation not permitted ``` Then strace showed: ``` sched_rr_get_interval_time64(0, 0xffbbd4a0) = -1 EPERM (Operation not permitted) ``` This appears to be the only time64 syscall that isn't already included one of the sets listed in nspawn-seccomp.c that has a non-time64 variant. Checked over each of the time64 syscalls known to systemd and verified that none of the others had a non-time64-variant whitelisted in nspawn other than sched_rr_get_interval. Bug: https://bugs.gentoo.org/880131
* nulstr-util: Declare NULSTR_FOREACH() iterator inlineDaan De Meyer2022-11-111-1/+0
|
* Rename def.h to constants.hZbigniew Jędrzejewski-Szmek2022-11-082-2/+2
| | | | | | The name "def.h" originates from before the rule of "no needless abbreviations" was established. Let's rename the file to clarify that it contains a collection of various semi-related constants.
* basic: rename util.h to logarithm.hZbigniew Jędrzejewski-Szmek2022-11-086-6/+0
| | | | | util.h is now about logarithms only, so we can rename it. Many files included util.h for no apparent reason… Those includes are dropped.
* basic: move a bunch of cmdline-related funcs to new argv-util.c+hZbigniew Jędrzejewski-Szmek2022-11-081-0/+1
| | | | | | | | | | | | | I wanted to move saved_arg[cv] to process-util.c+h, but this causes problems: process-util.h includes format-util.h which includes net/if.h, which conflicts with linux/if.h. So we can't include process-util.h in some files. But process-util.c is very long anyway, so it seems nice to create a new file. rename_process(), invoked_as(), invoked_by_systemd(), and argv_looks_like_help() which lived in process-util.c refer to saved_argc and saved_argv, so it seems reasonable to move them to the new file too. util.c is now empty, so it is removed. util.h remains.
* basic: move version() to build.h+cZbigniew Jędrzejewski-Szmek2022-11-081-1/+1
|
* nspawn: use in_same_namespace() helperChristian Brauner2022-10-041-11/+3
|
* tree-wide: use the term "initrd" at most places we so far used "initramfs"Lennart Poettering2022-09-231-4/+3
| | | | | | | | | | | | In most cases we refernced the concept as "initrd". Let's convert most remaining uses of "initramfs" to "initrd" too, to stay internally consistent. This leaves "initramfs" only where it's relevant to explain historical concepts or where "initramfs" is part of the API (i.e. in /run/initramfs). Follow-up for: b66a6e1a5838b874b789820c090dd6850cf10513
* tree-wide: drop unused reference to DecryptedImageYu Watanabe2022-09-181-3/+1
|
* nspawn: fix two error stringsLennart Poettering2022-09-171-2/+2
|
* tree-wide: use ASSERT_PTR moreDavid Tardon2022-09-134-117/+42
|
* dissect-image: use loop backing file or device node as name of the imageYu Watanabe2022-09-071-1/+0
| | | | | | Note, currently, for each call of dissect_loop_device_and_warn(), the specified name is equivalent to the path passed to loop_device_make_by_path(). Hence, this should not change the current behavios.
* nspawn: add support for rootidmap bind optionQuentin Deslandes2022-09-052-2/+4
| | | | | | | | rootidmap bind option will map the root user from the container to the owner of the mounted directory on the filesystem. This will ensure files and directories created by the root user in the container will be owned by the directory owner on the filesystem. All other user will remain unmapped.
* nspawn: rename RemountIdmapFlags enum to RemountIdmappingQuentin Deslandes2022-09-052-10/+10
| | | | | | This enum should be used to define various idmapping modes for bind mounts which might be incompatible. Changing its name and the values name to reflect that.
* dissect-image: introduce dissect_loop_device() which takes LoopDevice objectYu Watanabe2022-09-031-5/+2
|
* sd-device: rename devpath_from_devnum() -> devname_from_devnum()Yu Watanabe2022-09-031-1/+1
| | | | | | | | In sd-device, `devpath` is a kind of syspath without '/sys' prefix, e.g. /devices/pci0000:00/0000:00:1c.4/0000:3c:00.0/nvme/nvme0/nvme0n1, and `devname` is a path to the device node, e.g. /dev/nvme0n1. Let's use the consistent name for the helper function.
* loop-util: rework how we lock loopback block devicesLennart Poettering2022-09-011-7/+1
| | | | | | | | | | | | | | | | | | | | Let's rework how we lock loopback block devices in two ways: 1. Lock a separate fd, instead of the main block device fd. We already did that for our internal locking when allocating loopback block devices, but do so for the exposed locking (i.e. loop_device_flock()), too, so that the lock is independent of the main fd we actually use of IO. 2. Instead of locking the device during allocation of the loopback device, then unlocking it (which will make udev run), and then re-locking things if we need, let's instead just keep the lock the whole time, to make things a bit safer and faster, and not have to wait for udev at all. This is done by adding a "lock_op" parameter to loop device allocation functions that declares the initial state of the lock, and is one of LOCK_UN/LOCK_SH/LOCK_EX. This change also shortens a lot of code, since we allocate + immediately lock loopback devices pretty much everywhere.
* tree-wide: Use correct format specifiersJan Janssen2022-08-302-7/+7
| | | | gcc will complain about all these with -Wformat-signedness.