| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Meson+ninja+compiler do this for us and are better at it.
https://mesonbuild.com/FAQ.html#do-i-need-to-add-my-headers-to-the-sources-list-like-in-autotools
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's port one more over.
Note that this changes behaviour of file_in_same_dir() in some regards.
Specifically, a trailing slash of the input path will be treated
differently: previously we'd operate below that dir then, instead of the
parent. I think that makes little sense however, and I think the code
using this function doesn't expect that either.
Moroever, addresses some corner cases if the path is specified as "/" or
".", i.e. where e cannot extract a parent. These will now be treated as
error, which I think is much cleaner.
|
|
|
|
|
|
| |
Let's not leave the sector size unspecified: either set a user supplied
value, or auto-detect the right size by probing the disk image
accordingly.
|
|
|
|
|
|
|
| |
Inspired by #25957 there's one other place where we don't guard
acl_free() calls with a NULL check.
Fix that.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-1 was used everywhere, but -EBADF or -EBADFD started being used in various
places. Let's make things consistent in the new style.
Note that there are two candidates:
EBADF 9 Bad file descriptor
EBADFD 77 File descriptor in bad state
Since we're initializating the fd, we're just assigning a value that means
"no fd yet", so it's just a bad file descriptor, and the first errno fits
better. If instead we had a valid file descriptor that became invalid because
of some operation or state change, the other errno would fit better.
In some places, initialization is dropped if unnecessary.
|
| |
|
|\
| |
| | |
Run generators with / ro and /tmp mounted
|
| |
| |
| |
| |
| |
| | |
This is an octal number. We used the 0 prefix in some places inconsistently.
The kernel always interprets in base-8, so this has no effect, but I think
it's nicer to use the 0 to remind the reader that this is not a decimal number.
|
| |
| |
| |
| | |
Follow-up for b9e7f22c2d80930cad36ae53e66e42a2996dca4a.
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
Also stop stashing the kmsg fifo fd in the socket. Just retrieve it in
the parent and have the parent hold on to it.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| | |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
|/
|
|
| |
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
is "uninitialized"
Then, this drops ID128_PLAIN_OR_UNINIT. Also, this renames
Id128Format -> Id128FormatFlag, and make it bitfield.
Fixes #25634.
|
|\
| |
| | |
nspawn: support pivot_root()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before we supported pivot_root() nspawn used to make the rootfs shared
before setting up the mount tunnel. So it was safe for it to just turn
it into a dependent mount during setup.
However, we cannot do this anymore because of the requirements
pivot_root() has. After the pivot_root() we will make the rootfs shared
recursively. If we turned the mount tunnel into dependent mount before
mount_switch_root() this will have the consequence that it becomes a
shared mount within the same peer group as the rootfs. So no mounts will
propagate into the container from the host anymore.
To fix this we split setting up the mount tunnel and making it active
into two steps. Setting up the mount tunnel is performed before
mount_switch_root() and activating it afterwards. Note that this works
because turning a shared mount into a shared mount is a nop. IOW, no new
peer group will be allocated.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to mount procfs and sysfs in an unprivileged container the
kernel requires that a fully visible instance is already present in the
target mount namespace. Mount one here so the inner child can mount its
own instances. Later we umount the temporary instances created here
before we actually exec the payload. Since the rootfs is shared the
umount will propagate into the container. Note, the inner child wouldn't
be able to unmount the instances on its own since it doesn't own the
originating mount namespace. IOW, the outer child needs to do this.
So far nspawn didn't run into this issue because it used MS_MOVE which
meant that the shadow mount tree pinned a procfs and sysfs instance
which the kernel would find. The shadow mount tree is gone with proper
pivot_root() semantics.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to support pivot_root() we need to move mount propagation
changes after the pivot_root(). While MS_MOVE requires the source mount
to not be a shared mount pivot_root() also requires the target mount to
not be a shared mount. This guarantees that pivot_root() doesn't leak
any mounts.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
|
| | |
|
| |
| |
| |
| |
| | |
Maybe most people know that "^]" means "Ctrl + ]" but for those that
don't, this should be more clear.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DISSECT_IMAGE_OPEN_PARTITION_DEVICES
Curently, these two flags were implied by dissect_loop_device(), but
that's not right, because this means systemd-gpt-auto-generator will
dissect the root block device with these flags set and that's not
desirable: the generator should not cause the partition devices to be
created (we don't intend to use them right-away after all, but expect
udev to find/probe them first, and then mount them though .mount units).
And there's no point in opening the partition devices, since we do not
intend to mount them via fds either.
Hence, rework this: instead of implying the flags, specify them
explicitly.
While we are at it, let's also rename the flags to make them more
descriptive:
DISSECT_IMAGE_MANAGE_PARTITION_DEVICES becomes
DISSECT_IMAGE_ADD_PARTITION_DEVICES, since that's really all this does:
add the partition devices via BLKPG.
DISSECT_IMAGE_OPEN_PARTITION_DEVICES becomes
DISSECT_IMAGE_PIN_PARTITION_DEVICES, since we not only open the devices,
but keep the devices open continously (i.e. we "pin" them).
Also, drop the DISSECT_IMAGE_BLOCK_DEVICE combination flag, since it is
misleading, i.e. it suggests it was appropriate to specify on all
dissected blocking devices, but that's precisely not the case, see the
systemd-gpt-auto-generator case. My guess is that the confusion around
this was actually the cause for this bug we are addressing here.
Fixes: #25528
|
|\
| |
| | |
Update doc links
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We only allow a selected subset of syscalls from nspawn containers
and don't list any time64 variants (needed for 32-bit arches when
built using TIME_BITS=64, which is relatively new).
We allow sched_rr_get_interval which cpython's test suite makes
use of, but we don't allow sched_rr_get_interval_time64.
The test failures when run in an arm32 nspawn container on an arm64 host
were as follows:
```
======================================================================
ERROR: test_sched_rr_get_interval (test.test_posix.PosixTester.test_sched_rr_get_interval)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/var/tmp/portage/dev-lang/python-3.11.0_p1/work/Python-3.11.0/Lib/test/test_posix.py", line 1180, in test_sched_rr_get_interval
interval = posix.sched_rr_get_interval(0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PermissionError: [Errno 1] Operation not permitted
```
Then strace showed:
```
sched_rr_get_interval_time64(0, 0xffbbd4a0) = -1 EPERM (Operation not permitted)
```
This appears to be the only time64 syscall that isn't already included one of
the sets listed in nspawn-seccomp.c that has a non-time64 variant. Checked
over each of the time64 syscalls known to systemd and verified that none
of the others had a non-time64-variant whitelisted in nspawn other than
sched_rr_get_interval.
Bug: https://bugs.gentoo.org/880131
|
| |
|
|
|
|
|
|
| |
The name "def.h" originates from before the rule of "no needless abbreviations"
was established. Let's rename the file to clarify that it contains a collection
of various semi-related constants.
|
|
|
|
|
| |
util.h is now about logarithms only, so we can rename it. Many files included
util.h for no apparent reason… Those includes are dropped.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted to move saved_arg[cv] to process-util.c+h, but this causes problems:
process-util.h includes format-util.h which includes net/if.h, which conflicts
with linux/if.h. So we can't include process-util.h in some files.
But process-util.c is very long anyway, so it seems nice to create a new file.
rename_process(), invoked_as(), invoked_by_systemd(), and argv_looks_like_help()
which lived in process-util.c refer to saved_argc and saved_argv, so it seems
reasonable to move them to the new file too.
util.c is now empty, so it is removed. util.h remains.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In most cases we refernced the concept as "initrd". Let's convert most
remaining uses of "initramfs" to "initrd" too, to stay internally
consistent.
This leaves "initramfs" only where it's relevant to explain historical
concepts or where "initramfs" is part of the API (i.e. in
/run/initramfs).
Follow-up for: b66a6e1a5838b874b789820c090dd6850cf10513
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Note, currently, for each call of dissect_loop_device_and_warn(), the
specified name is equivalent to the path passed to loop_device_make_by_path().
Hence, this should not change the current behavios.
|
|
|
|
|
|
|
|
| |
rootidmap bind option will map the root user from the container to the
owner of the mounted directory on the filesystem. This will ensure files
and directories created by the root user in the container will be owned
by the directory owner on the filesystem. All other user will remain
unmapped.
|
|
|
|
|
|
| |
This enum should be used to define various idmapping modes for bind
mounts which might be incompatible. Changing its name and the values
name to reflect that.
|
| |
|
|
|
|
|
|
|
|
| |
In sd-device, `devpath` is a kind of syspath without '/sys' prefix, e.g.
/devices/pci0000:00/0000:00:1c.4/0000:3c:00.0/nvme/nvme0/nvme0n1,
and `devname` is a path to the device node, e.g. /dev/nvme0n1.
Let's use the consistent name for the helper function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's rework how we lock loopback block devices in two ways:
1. Lock a separate fd, instead of the main block device fd. We already
did that for our internal locking when allocating loopback block
devices, but do so for the exposed locking (i.e.
loop_device_flock()), too, so that the lock is independent of the
main fd we actually use of IO.
2. Instead of locking the device during allocation of the loopback
device, then unlocking it (which will make udev run), and then
re-locking things if we need, let's instead just keep the lock the
whole time, to make things a bit safer and faster, and not have to
wait for udev at all. This is done by adding a "lock_op" parameter to
loop device allocation functions that declares the initial state of
the lock, and is one of LOCK_UN/LOCK_SH/LOCK_EX. This change also
shortens a lot of code, since we allocate + immediately lock loopback
devices pretty much everywhere.
|
|
|
|
| |
gcc will complain about all these with -Wformat-signedness.
|