summaryrefslogtreecommitdiff
path: root/src/shared
Commit message (Collapse)AuthorAgeFilesLines
* switch-root: add a comment regarding the safety limits of rm_rf_children()Lennart Poettering2023-05-171-0/+2
|
* Merge pull request #27638 from YHNdnzj/upheldby-unit-fileMike Yuan2023-05-162-5/+17
|\ | | | | unit-file: support UpheldBy= in [Install] settings (adding Upholds= deps from .upholds/)
| * unit-file: support UpheldBy= in [Install] settings (adding Upholds= depsMike Yuan2023-05-152-5/+17
| | | | | | | | | | | | from .upholds/) Closes #26896
* | Merge pull request #27648 from poettering/common-dissect-dirLennart Poettering2023-05-163-14/+31
|\ \ | | | | | | pid1: add common root dir inode to mount disk images to in private namespaces
| * | dissect-image: port mount_image_privately_interactively() to use ↵Lennart Poettering2023-05-161-14/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /run/systemd/mount-rootfs/ too Let's use the same common directory as the unit logic uses. This means we have less to clean up, and opens the door to eventually allow unprivileged operation of the mount_image_privately_interactively() logic.
| * | namespace: introduce a common dir in /run/ that we can use to see new root ↵Lennart Poettering2023-05-161-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs up on This creates a new dir /run/systemd/mount-rootfs/ early in PID 1 that thus always exists. It's supposed to be used by any code that creates its own mount namespace and then sets up a new root dir to switch into. So far in many cases we used a temporary dir (which needed explicit clean-up) or a purpose-specific fixed dir. Let's create a common dir instead, that always exists (as it is created in PID 1 early on, always). Besides making things more robust, as manual clean-up of the inode is not necessary anymore this also opens the door for unprivileged programs to use the same dir, since it now always exists. Set the access mode to 555 (instead of the otherwise previously used 0755, 0700 or similar), so that unprivileged programs can access it, but we make clear it's not supposed to be written directly to, by anyone, not even root.
| * | mount-util: add umount_and_free() helperLennart Poettering2023-05-161-0/+8
| | |
* | | Merge pull request #27647 from poettering/mount-setup-tweakletsLennart Poettering2023-05-161-21/+17
|\ \ \ | | | | | | | | mount-setup: minor tweaks
| * | | mount-setup: minor modernizationLennart Poettering2023-05-161-15/+13
| | | |
| * | | mount-setup: minor log improvementLennart Poettering2023-05-161-1/+1
| | | |
| * | | mount-setup: port to logging about mount attempts via mount_*follow_verbose()Lennart Poettering2023-05-161-5/+3
| |/ /
* | | base-filesystem: mention why we don't carry an entry for /tmp/ for nowLennart Poettering2023-05-161-0/+4
| | |
* | | base-filesystem: also set up /run/ mount point if missingLennart Poettering2023-05-161-0/+1
|/ / | | | | | | | | | | We don't support images without, hence create this one too, like we create all other relevant mount points we definitely require for booting.
* | watchdog: always disarm watchdog properly before closing itLennart Poettering2023-05-151-5/+10
|/ | | | | | | If we change the watchdog device we should disarm the old one first. Similar, if we open the watchdog, but then fail setting it up, disarm it before closing it again.
* conf-parser: Add root argument to config_parse_many()Daan De Meyer2023-05-122-3/+6
|
* mkfs-util: Add quiet argument to make_filesystem()Daan De Meyer2023-05-122-12/+30
| | | | | | | | We default to quiet operation everywhere except for repart, where we disable quiet and have the mkfs tools write to stdout. We also make sure --quiet or equivalent is implemented for all mkfs tools.
* bus-util: drop unnecessary continueYu Watanabe2023-05-091-1/+1
|
* parse-util: make parse_fd() return -EBADFYu Watanabe2023-05-082-5/+0
| | | | | | | | The previous error code -ERANGE is slightly ambiguous, and use more specific one. This also drops unnecessary error handlings. Follow-up for 754d8b9c330150fdb3767491e24975f7dfe2a203 and e652663a043cb80936bb12ad5c87766fc5150c24.
* Merge pull request #26357 from ddstreet/tpm2_policy_sessionLuca Boccassi2023-05-062-87/+129
|\ | | | | Tpm2 policy session
| * tpm2: move policy building out of policy session creationDan Streetman2023-05-051-87/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This retains the use of policy sessions instead of trial sessions in most cases, based on the code comment that some TPMs do not implement trial sessions correctly. However, it's likely that the issue was not the TPMs, but our code's incorrect use of PolicyPCR inside a trial session; we are not providing expected PCR values with our call to PolicyPCR inside a trial session, but the spec indicates that in a trial session, the TPM *may* return error if the expected PCR value(s) are not provided. That may have been the source of the original confusion about trial sessions. More details: https://github.com/systemd/systemd/pull/26357#pullrequestreview-1409983694 Also, future commits will replace the use of trial sessions with policy calculations, which avoids the problem entirely.
| * tpm2: add tpm2_is_encryption_session()Dan Streetman2023-05-052-0/+17
| |
* | shared: refuse fd == INT_MAXFrantisek Sumsal2023-05-051-0/+14
|/ | | | | | | Since we do `FD_TO_PTR(fd)` that expands to `INT_TO_PTR(fd) + 1` which triggers an integer overflow. Resolves: #27522
* Merge pull request #27536 from dtardon/checked-fd-parsingLuca Boccassi2023-05-052-9/+9
|\ | | | | Always check parsed fds for validity
| * tree-wide: use parse_fd()David Tardon2023-05-052-9/+9
| |
* | tpm2 PCRs: fix unchecked attempt to set PCR[24]OMOJOLA2023-05-051-1/+1
|/
* mount-util: simplify mount_switch_root() a bitLennart Poettering2023-05-032-37/+37
| | | | | | | | | | | | | There's no need to fchdir() out of the rootfs and back into it around the umount2(), hence don't. This brings the logic closer to what the pivot_root() man page suggests. While we are at it, always operate based on fds, once we opened the original dir, and pass the path string along only for generating messages (i.e. as "decoration"). Add tests for both code paths: the pivot_root() one and the MS_MOUNT.
* base-filesystem: unify common parts of base_filesystem_create_fd() branchesLennart Poettering2023-05-031-25/+13
| | | | | | | The error handling and fchmodat() invocation is pretty much the same in the directory and symlink branches, hence make them the same. No real change in behaviour. Just refactoring.
* base-filesystem: add new helper base_filesystem_create_fd() that operates on ↵Lennart Poettering2023-05-033-6/+17
| | | | | | | | | | | | | | | | | | an fd, instead of a path This also changes the open flags from O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC|O_NOFOLLOW to O_DIRECTORY|O_CLOEXEC. O_RDONLY is redundant, since O_RDONLY is zero anyway, and O_DIRECTORY pins the acces mode enough: it doesn't allow read()/write() anyway when specified. O_NONBLOCK is also pointless given that O_DIRECTORY is specified, it has no meaning on directories. (It is useful if we don't know much about the inode we are opening, and could be a device node or fifo, but the O_DIRECTORY excludes that case.) O_NOFOLLOW is dropped since there's really no point in blocking out the initial entrypoint being a symlink. Once we pinned the the root of the tree it might make sense to restrict symlink use below it, but for the entrypoint itself it doesn't matter.
* switch-root: don't require /mnt/ when switching root into host OSLennart Poettering2023-05-032-40/+49
| | | | | | | | | | | | | | | | So far, we invoked pivot_root() specifying /mnt/ as second argument, which then unmounted right-after. We'd create /mnt/ if needed. This sucks, because it means /mnt/ must strictly be pre-created on immutable images. Remove this limitation, by using pivot_root() with "." as source and target, which will result in two stacked mounts afterwards: the new one underneath, the old one ontop. We can then simply unmount the top one, and have what we want without needing any extra /mnt/ dir. Since we don't need /mnt/ anymore we can get rid of the extra unmount_old_root parameter and simply specify it as NULL if we don't want the old mount to stick around.
* Merge pull request #27504 from mrc0mmand/fuzz-manager-serializeYu Watanabe2023-05-032-1/+6
|\ | | | | test: add a simple fuzzer for manager serialization
| * shared: reject empty attachment pathFrantisek Sumsal2023-05-031-0/+3
| |
| * shared: ignore invalid valink socket fd when deserializingFrantisek Sumsal2023-05-031-1/+3
| |
* | Merge pull request #27492 from poettering/base-filesystem-000Mike Yuan2023-05-022-6/+4
|\ \ | | | | | | base-filesystem: create /proc, /sys, /dev mount points as 555
| * | mount-setup: use size_t when iterating through array indexesLennart Poettering2023-05-021-3/+1
| | |
| * | base-filesystem: create /proc, /sys, /dev mount points as 0555Lennart Poettering2023-05-021-3/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | These inodes are going to be overmounted anyway, hence let's create them with access mode 555, so that they are as close to being immutable as regular UNIX access modes allow them to be. In other words: this takes the "w" mode away for root. This of course usually has little effect -- unless CAP_DAC_OVERRIDE is dropped. But at the very least it makes the point clear that inodes should be considered immutable. (I intended to make this 0000 originally, but that doesn't work, as many tools – including our own – have fallback paths that when they see ENOENT in /proc/ they can handle this gracefully. But changing the mode to 000 would turn this to EACCES - something they usually have no fallback path for)
* | tree-wide: Handle EADDRNOTAVAIL as journal corruptionDaan De Meyer2023-05-021-6/+6
|/ | | | | Journal corruption is not only indicated by EBADMSG but also by EADDRNOTAVAIL so treat that as corruption in a few more cases.
* tpm2: move openssl-required ifdef code out of policy-building functionDan Streetman2023-05-011-40/+58
|
* copy: shortcut reflink_range() to reflink() in some casesLennart Poettering2023-04-281-0/+6
|
* copy: don't call clone ioctls twiceLennart Poettering2023-04-281-9/+5
| | | | | The btrfs name and the generic name have the same values, hence there's no point in bothering with the former.
* Merge pull request #27440 from yuwata/reflink-follow-upsLuca Boccassi2023-04-282-3/+3
|\ | | | | copy: follow ups for reflink()
| * copy: rename reflink_full() -> reflink_range()Yu Watanabe2023-04-282-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | The commit b640e274a7c363a2b6394c9dce5671d9404d2e2a introduced reflink() and reflink_full(). We usually name function xyz_full() for fully parameterized version of xyz(), and xyz() is typically a inline alias of xyz_full(). But in this case, reflink() and reflink_full() call different ioctl(). Moreover, reflink_full() does partial reflink, while reflink() does full file reflink. That's super confusing. Let's rename reflink_full() to reflink_range(), the new name is consistent with ioctl name, and should be fine.
* | Merge pull request #27424 from dtardon/auto-cleanupYu Watanabe2023-04-281-5/+4
|\ \ | |/ |/| More automatic cleanup
| * specifier: use _cleanup_David Tardon2023-04-271-5/+4
| |
* | copy: Introduce reflink() and reflink_full()Daan De Meyer2023-04-284-39/+54
| | | | | | | | | | | | The kernel has had filesystem independent reflink ioctls for a while now, let's try to use them and fall back to the btrfs specific ones if they're not supported.
* | pam-systemd: disconnect bus connection when leaving session hook, even on errorLennart Poettering2023-04-272-24/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for systematically destroying connections in pam_sm_session_open() even on failure, so that under no circumstances unserved dbus connection are around while the invoking process waits for the session to end. Previously we'd only do this on success, now do it in all cases. This matters since so far we suggested people hook pam_systemd into their pam stacks prefixed with "-", so that login proceeds even if pam_systemd fails. This however means that in an error case our cached connection doesn't get disconnected even if the session then is invoked. This fixes that.
* | pam-util: include PID in PAM data field idLennart Poettering2023-04-271-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | Let's systematically avoid sharing cached busses between processes (i.e. from parent and child after fork()), by including the PID in the field name. With that we're never tempted to use a bus object the parent created in the child. (Note this is about *use*, not about *destruction*. Destruction needs to be checked by other means.)
* | Merge pull request #25622 from YHNdnzj/tmpfiles-X-bit-supportMike Yuan2023-04-272-12/+63
|\ \ | | | | | | tmpfiles: add conditionalized execute permission (X) support
| * | tmpfiles: add conditionalized execute bit (X) supportMike Yuan2023-04-272-12/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to setfacl(1), "the character X stands for the execute permission if the file is a directory or already has execute permission for some user." After this commit, parse_acl() would return 3 acl objects. The newly-added acl_exec object contains entries that are subject to conditionalized execute bit mangling. In tmpfiles, we would iterate the acl_exec object, check the permission of the target files, and remove the execute bit if necessary. Here's an example entry: A /tmp/test - - - - u:test:rwX Closes #25114
* | | core: change ownership of subcgroup we create recursively, it shall be owned ↵Lennart Poettering2023-04-272-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | by the user delegated to If we create a subcroup (regardless if the '.control' subgroup we always created or one configured via DelegateSubgroup=) it's inside of the delegated territory of the cgroup tree, hence it should be owned fully by the unit's users. Hence do so.
* | | core: add DelegateSubgroup= settingLennart Poettering2023-04-271-1/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements a minimal subset of #24961, but in a lot more restrictive way: we only allow one level of subcgroup (as that's enough to address the no-processes in inner cgroups rule), and does not change anything about threaded cgroup logic or similar, or make any of this new behaviour mandatory. All this does is this: all non-control processes we invoke for a unit we'll invoke in a subgroup by the specified name. We'll later port all our current services that use cgroup delegation over to this, i.e. user@.service, systemd-nspawn@.service and systemd-udevd.service.