| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
We already have the systemd.tty.xxx kernel cmdline arguments for
configuring tty's for services, let's make sure the term cmdline
argument applies to pid1 as well.
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we switch root and can't execute an init program afterwards, we're
completely stuck as we can't go back to the initramfs to start
emergency.service as it will have been completely removed by the switch
root operation.
To prevent leaving users with a completely undebuggable system, let's
at least check before we switch root whether at least one of the init
programs we might want to execute actually exist, and fail early if
none of them exists.
|
|
|
|
|
|
|
| |
When running in a VM, we now support propagating the exit status
via a vsock notify socket, so drop the restrictions on propagating
an exit status when not in a container to make sure this works
properly.
|
|
|
|
|
|
|
|
| |
The previous error code -ERANGE is slightly ambiguous, and use more
specific one. This also drops unnecessary error handlings.
Follow-up for 754d8b9c330150fdb3767491e24975f7dfe2a203 and
e652663a043cb80936bb12ad5c87766fc5150c24.
|
| |
|
| |
|
|
|
|
| |
Follow-up-for: 2b5107e1625e0847179da0d35eb544192766886f
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far, we invoked pivot_root() specifying /mnt/ as second argument,
which then unmounted right-after. We'd create /mnt/ if needed. This
sucks, because it means /mnt/ must strictly be pre-created on immutable
images.
Remove this limitation, by using pivot_root() with "." as source and
target, which will result in two stacked mounts afterwards: the new one
underneath, the old one ontop. We can then simply unmount the top one,
and have what we want without needing any extra /mnt/ dir.
Since we don't need /mnt/ anymore we can get rid of the extra
unmount_old_root parameter and simply specify it as NULL if we don't
want the old mount to stick around.
|
|
|
|
|
|
| |
Let's make sure we parse the logging environment ASAP so that the
options apply to more code. e.g. to allow debugging kmod-setup.c
for example.
|
|
|
|
|
|
| |
Follow-up for d2ebd50d7f9740dcf30e84efc75610af173967d2
Fixes #27105
|
| |
|
|
|
|
|
|
|
|
|
| |
When running in a container, we can propagate the exit status of
pid1 as usual via the process exit status. This is not possible
when running in a VM. Instead, let's send EXIT_STATUS=%i via the
notify socket if one is configured. The user running the VM can then
pick up the exit status from the notify socket after the VM has shut
down.
|
|\
| |
| | |
proc-cmdline: filter PID1 arguments on container
|
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed
unexpectedly by generators.
Fixes the issue reported at
https://github.com/systemd/systemd/issues/24452#issuecomment-1475004433.
|
| |
| |
| |
| |
| |
| |
| | |
For future-proof reasons, in case we will add
another option that starts with --deserialize.
Addresses https://github.com/systemd/systemd/commit/4f44d2c4f76922a4f48dd4473e6abaca40d7e555#r107285603
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Follow-up for d2ebd50d7f9740dcf30e84efc75610af173967d2
We now modify our cmdline to use '=' for all arguments,
but didn't change early setup check to work with that.
So every daemon-reexec does a full setup, thus breaking
running user sessions.
Fixes #27106
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have three states:
- ENABLE_COREDUMP and systemd-coredump is installed,
- ENABLE_COREDUMP but systemd-coredump is not installed,
- !ENABLE_COREDUMP.
In the last case we would not do any coredumping-related setup in pid1, which
means that coredumps would go to to the working directory of the process, but
actually limits are set to 0. This is inherited by children of pid1.
As discussed extensively in https://github.com/systemd/systemd/pull/26607, this
default is bad: dumps are written to arbitrary directories and not cleaned up.
Nevertheless, the kernel cannot really fix it. It doesn't know where to write,
and it doesn't know when that place would become available. It is only the
userspace that can tell this to the kernel. So the only sensible change in the
kernel would be to default to '|/bin/false', i.e. do what we do now.
In the middle case, we disabled writing of coredumps via a pattern, but raise
the RLIMIT_CORE. We need to raise the limit because we can't raise it later
after processes have been forked off. This means we behave correctly, but allow
coredumping to be enabled at a later point without a reboot.
This patch makes the last case behave like the middle case. This means that
even if systemd is compiled with systemd-coredump, it still does the usual
setup. If users want to restore the kernel default, they need to provide two
drop-in files:
for sysctl.d, with 'kernel.core_pattern=core'
for systemd.conf, with 'DefaultLimitCORE=0'.
The general idea is that pid1 does the safe thing. A distro may want to use
something different than the systemd-coredump machinery, and then that would
could packaged together with the drop-ins to change the configuration.
Alternative-for: #26607
|
|
|
|
|
|
|
|
|
|
|
| |
'pos' is incremented after each assignment. If we use the maximum number
of arguments, we end up with pos==9 after all the assignments, and it
points to where the next value would be assigned. This position must remain
NULL.
The assert I "fixed" was intentionally introduced in
26abdc73a212b90f7c4b71808a1028d2e87ab09f as a bugfix. So my "fix" repeated
the same error that was fixed back then.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixup for d2ebd50d7f9740dcf30e84efc75610af173967d2
and 6920049fad4fa39db5fec712f82f7f75b98fd4b9:
- add a comment that the last arg must be NULL and adjust the assert.
- move initialization around so that fields are declared,
initialized, and consumed in the same order.
- move declaration of pos adjacent do declaration of command_line.
This makes it easy to see that it was not initialized correctly.
- initialize buffers before writing the pointer into the args array.
This makes no difference for the compiler, but it just feels "wrong"
to do it in opposite order.
Because pos was off, we would ignore args after the timeout, and also
overwrite the buffer if enough args were used.
I think this is case shows clearly that declaring all variables at the
top of the function, with some initialized and other not, is very
error-prone. The compiler has no issue with declaring variables whereever,
and we should take advantage of this to make it keep declaration,
initialization, and use close. (Within reason of course.)
|
|
|
|
|
|
|
| |
To make ConditionKernelCommandLine= or friend not confused when we are
running in a container.
Addresses https://github.com/systemd/systemd/pull/26887#discussion_r1143358884.
|
|
|
|
| |
Follow-up for c5673ed0de3bec38f68d8113d253842b47766e27.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All daemons use a similar scheme to read their main config files and theirs
drop-ins. The main config files are always stored in /etc/systemd directory and
it's easy enough to construct the name of the drop-in directories based on the
name of the main config file.
Hence the new helper does that internally, which allows to reduce and simplify
the args passed previously to config_parse_many_nulstr().
Besides the overall code simplification it results:
16 files changed, 87 insertions(+), 159 deletions(-)
it allows to identify clearly the locations in the code where configuration
files are parsed.
|
|
|
|
| |
string
|
|
|
|
|
|
|
|
|
|
|
|
| |
In various tools and services we have a per-system and per-user concept.
So far we sometimes used a boolean indicating whether we are in system
mode, or a reversed boolean indicating whether we are in user mode, or
the LookupScope enum used by the lookup path logic.
Let's address that, in introduce a common enum for this, we can use all
across the board.
This is mostly just search/replace, no actual code changes.
|
|
|
|
|
|
| |
This enables the ManagerEnvironment= settings in the user's user.conf to
reference some user data like $HOME for the purpose of setting
environment variables derived from these values.
|
|
|
|
|
|
| |
Follow-up for #26393.
Addresses https://github.com/systemd/systemd/pull/26393#issuecomment-1458655798.
|
|
|
|
|
|
| |
Reloading might be slow, especially when under memory pressure, and watchdogs
might be triggered. It is useful to have timestamped telemetry in the journal
to see how long a reload takes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the support for valgrind was under HAVE_VALGRIND_VALGRIND_H, i.e. we
would enable if the valgrind headers were found. The operations then we be
conditionalized on RUNNING_UNDER_VALGRIND.
But in a few places we had code which was conditionalized on VALGRIND, i.e. the
config option. I noticed because I compiled with -Dvalgrind=true on a machine
that didn't have valgrind.h, and the build failed because
RUNNING_UNDER_VALGRIND was not defined. My first idea was to add a check that
the header is present if the option is set, but it seems better to just remove
the option. The code to support valgrind is trivial, and if we're
!RUNNING_UNDER_VALGRIND, it has negligible cost. And the case of running under
valgrind is always some special testing/debugging mode, so we should just do
those extra steps to make valgrind output cleaner. Removing the option makes
things simpler and we don't have to think if something should be covered by the
one or the other configuration bit.
I had a vague recollection that in some places we used -Dvalgrind=true not
for valgrind support, but to enable additional cleanup under other sanitizers.
But that code would fail to build without the valgrind headers anyway, so
I'm not sure if that was still used. If there are uses like that, we can
extend the condition for cleanup_pools().
|
|
|
|
|
|
|
|
|
| |
We should be more careful with distinguishing the cases "all bits set in
caps mask" from "cap mask invalid". We so far mostly used UINT64_MAX for
both, which is not correct though (as it would mean
AmbientCapabilities=~0 followed by AmbientCapabilities=0) would result
in capability 63 to be set (which we don't really allow, since that
means unset).
|
|
|
|
|
| |
quite often we want to set a log target and immediately open it. Add a
common helper for that.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Config options are -Ddefault-timeout-sec= and -Ddefault-user-timeout-sec=.
Existing -Dupdate-helper-user-timeout= is renamed to -Dupdate-helper-user-timeout-sec=
for consistency. All three options take an integer value in seconds. The
renaming and type-change of the option is a small compat break, but it's just
at compile time and result in a clear error message. I also doubt that anyone was
actually using the option.
This commit separates the user manager timeouts, but keeps them unchanged at 90 s.
The timeout for the user manager is set to 4/3*user-timeout, which means that it
is still 120 s.
Fedora wants to experiment with lower timeouts, but doing this via a patch would
be annoying and more work than necessary. Let's make this easy to configure.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
And send READY=1 again when we are done with it.
We do this not only for "daemon-reload" but also for "daemon-reexec" and
"switch-root", since from the perspective of an encapsulating service
manager these three operations are not that different.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pointer passed in
In 'data' we get the location passed in we write stuff, and that's not
the Manager object.
And we neither get the Manager passed in via 'userdata', because at the
time we parse the emergency action for the manager the Manager is not
actually allocated yet.
hence, let's fix this differently, and pass in the user/system mode
descriptor via the 'ltype' argument.
Fixes: #25933
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a small cleanup removing the need for the spurious
*ret_shutdown_verb argument on invoke_main_loop() while moving
the MANAGER_OBJECTIVE::shutdown_verb string mapping local to
where it actually gets added to the shutdown argv in
become_shutdown().
This also eliminates the need for the several clearings of
*ret_shutdown_argv, and the streq() branches of shutdown_verb in
favor of plain equality tests against the objective value.
Nothing functionally has been changed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-1 was used everywhere, but -EBADF or -EBADFD started being used in various
places. Let's make things consistent in the new style.
Note that there are two candidates:
EBADF 9 Bad file descriptor
EBADFD 77 File descriptor in bad state
Since we're initializating the fd, we're just assigning a value that means
"no fd yet", so it's just a bad file descriptor, and the first errno fits
better. If instead we had a valid file descriptor that became invalid because
of some operation or state change, the other errno would fit better.
In some places, initialization is dropped if unnecessary.
|
|\
| |
| | |
Run generators with / ro and /tmp mounted
|
| |
| |
| |
| |
| |
| | |
RUN_WITH_UMASK was initially conceived for spawning externals progs with the
umask set. But nowadays we use it various syscalls and stuff that doesn't "run"
anything, so the "RUN_" prefix has outlived its usefulness.
|
|/
|
|
|
|
| |
Reloading is a heavy-weight operation, and currently it is not
possible to stop an orchestrator from spamming reload requests.
Add configuration options to allow rate-limiting.
|
| |
|
|
|
|
|
| |
This moves the formatting of the constant to compilation time and let's us
avoid asprintf() in the very hot path of initial boot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When booting with debug logs, we print:
Setting '/proc/sys/fs/file-max' to '9223372036854775807
'
Setting '/proc/sys/fs/nr_open' to '2147483640
'
Couldn't write fs.nr_open as 2147483640, halving it.
Setting '/proc/sys/fs/nr_open' to '1073741816
'
Successfully bumped fs.nr_open to 1073741816
The strange formatting is because we explicitly appended a newline in those two
places. It seems that the kernel doesn't care. In fact, we have a few dozen other
writes to sysctl where we don't append a newline. So let's just drop those here
too, to make the code a bit simpler and avoid strange output in the logs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than passing seeds up to userspace via EFI variables, pass seeds
directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID.
EFI variables can potentially leak and suffer from forward secrecy
issues, and processing these with userspace means that they are
initialized much too late in boot to be useful. In contrast,
LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so
is hidden from userspace entirely, and is parsed extremely early on by
the kernel, so that every single call to get_random_bytes() by the
kernel is seeded.
In order to do this properly, we use a bit more robust hashing scheme,
and make sure that each input is properly memzeroed out after use. The
scheme is:
key = HASH(LABEL || sizeof(input1) || input1 || ... || sizeof(inputN) || inputN)
new_disk_seed = HASH(key || 0)
seed_for_linux = HASH(key || 1)
The various inputs are:
- LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders
- 256 bits of seed from EFI's RNG
- The (immutable) system token, from its EFI variable
- The prior on-disk seed
- The UEFI monotonic counter
- A timestamp
This also adjusts the secure boot semantics, so that the operation is
only aborted if it's not possible to get random bytes from EFI's RNG or
a prior boot stage. With the proper hashing scheme, this should make
boot seeds safe even on secure boot.
There is currently a bug in Linux's EFI stub in which if the EFI stub
manages to generate random bytes on its own using EFI's RNG, it will
ignore what the bootloader passes. That's annoying, but it means that
either way, via systemd-boot or via EFI stub's mechanism, the RNG *does*
get initialized in a good safe way. And this bug is now fixed in the
efi.git tree, and will hopefully be backported to older kernels.
As the kernel recommends, the resultant seeds are 256 bits and are
allocated using pool memory of type EfiACPIReclaimMemory, so that it
gets freed at the right moment in boot.
|
|
|
|
|
|
| |
The name "def.h" originates from before the rule of "no needless abbreviations"
was established. Let's rename the file to clarify that it contains a collection
of various semi-related constants.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted to move saved_arg[cv] to process-util.c+h, but this causes problems:
process-util.h includes format-util.h which includes net/if.h, which conflicts
with linux/if.h. So we can't include process-util.h in some files.
But process-util.c is very long anyway, so it seems nice to create a new file.
rename_process(), invoked_as(), invoked_by_systemd(), and argv_looks_like_help()
which lived in process-util.c refer to saved_argc and saved_argv, so it seems
reasonable to move them to the new file too.
util.c is now empty, so it is removed. util.h remains.
|
|
|
|
|
|
|
|
|
| |
I changed imports of util.h to initrd-util.h, or added an import of
initrd-util.h, to keep compilation working. It turns out that many files didn't
import util.h directly.
When viewing the patch, don't be confused by git rename detection logic:
a new .c file is added and two functions moved into it.
|