summaryrefslogtreecommitdiff
path: root/src/core/main.c
Commit message (Collapse)AuthorAgeFilesLines
* conf-parser: Add root argument to config_parse_many()Daan De Meyer2023-05-121-0/+1
|
* core: Try to initialize TERM from systemd.tty.term.console as wellDaan De Meyer2023-05-121-0/+6
| | | | | | We already have the systemd.tty.xxx kernel cmdline arguments for configuring tty's for services, let's make sure the term cmdline argument applies to pid1 as well.
* core: Check if any init exists before switching rootDaan De Meyer2023-05-121-0/+19
| | | | | | | | | | | | If we switch root and can't execute an init program afterwards, we're completely stuck as we can't go back to the initramfs to start emergency.service as it will have been completely removed by the switch root operation. To prevent leaving users with a completely undebuggable system, let's at least check before we switch root whether at least one of the init programs we might want to execute actually exist, and fail early if none of them exists.
* core: Make sure systemctl exit <X> works outside of a containerDaan De Meyer2023-05-121-4/+2
| | | | | | | When running in a VM, we now support propagating the exit status via a vsock notify socket, so drop the restrictions on propagating an exit status when not in a container to make sure this works properly.
* parse-util: make parse_fd() return -EBADFYu Watanabe2023-05-081-3/+1
| | | | | | | | The previous error code -ERANGE is slightly ambiguous, and use more specific one. This also drops unnecessary error handlings. Follow-up for 754d8b9c330150fdb3767491e24975f7dfe2a203 and e652663a043cb80936bb12ad5c87766fc5150c24.
* main: improve log messageDavid Tardon2023-05-051-1/+1
|
* tree-wide: use parse_fd()David Tardon2023-05-051-6/+4
|
* main: add missing returnDavid Tardon2023-05-051-1/+1
| | | | Follow-up-for: 2b5107e1625e0847179da0d35eb544192766886f
* switch-root: don't require /mnt/ when switching root into host OSLennart Poettering2023-05-031-1/+1
| | | | | | | | | | | | | | | | So far, we invoked pivot_root() specifying /mnt/ as second argument, which then unmounted right-after. We'd create /mnt/ if needed. This sucks, because it means /mnt/ must strictly be pre-created on immutable images. Remove this limitation, by using pivot_root() with "." as source and target, which will result in two stacked mounts afterwards: the new one underneath, the old one ontop. We can then simply unmount the top one, and have what we want without needing any extra /mnt/ dir. Since we don't need /mnt/ anymore we can get rid of the extra unmount_old_root parameter and simply specify it as NULL if we don't want the old mount to stick around.
* core: Parse logging environment earlierDaan De Meyer2023-04-201-4/+10
| | | | | | Let's make sure we parse the logging environment ASAP so that the options apply to more code. e.g. to allow debugging kmod-setup.c for example.
* core/main: fix a typo for --log-targetMike Yuan2023-04-131-1/+1
| | | | | | Follow-up for d2ebd50d7f9740dcf30e84efc75610af173967d2 Fixes #27105
* core: Send ERRNO= via notify socket on exitDaan De Meyer2023-04-121-0/+3
|
* core: Propagate exit status via notify socket when running in VMDaan De Meyer2023-04-121-0/+4
| | | | | | | | | When running in a container, we can propagate the exit status of pid1 as usual via the process exit status. This is not possible when running in a VM. Instead, let's send EXIT_STATUS=%i via the notify socket if one is configured. The user running the VM can then pick up the exit status from the notify socket after the VM has shut down.
* Merge pull request #26887 from yuwata/proc-cmdline-filter-argumentsZbigniew Jędrzejewski-Szmek2023-04-071-54/+6
|\ | | | | proc-cmdline: filter PID1 arguments on container
| * proc-cmdline: filter PID1 arguments when we are running in a containerYu Watanabe2023-03-291-54/+6
| | | | | | | | | | | | | | | | Otherwise, PID1 arguments e.g. "--deserialize 16" may be parsed unexpectedly by generators. Fixes the issue reported at https://github.com/systemd/systemd/issues/24452#issuecomment-1475004433.
* | core/main: also check the argument terminatorMike Yuan2023-04-031-1/+1
| | | | | | | | | | | | | | For future-proof reasons, in case we will add another option that starts with --deserialize. Addresses https://github.com/systemd/systemd/commit/4f44d2c4f76922a4f48dd4473e6abaca40d7e555#r107285603
* | core: do early setup check for arguments with '=' tooMike Yuan2023-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | Follow-up for d2ebd50d7f9740dcf30e84efc75610af173967d2 We now modify our cmdline to use '=' for all arguments, but didn't change early setup check to work with that. So every daemon-reexec does a full setup, thus breaking running user sessions. Fixes #27106
* | pid1: fully disable coredumping to $PWDZbigniew Jędrzejewski-Szmek2023-03-301-2/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have three states: - ENABLE_COREDUMP and systemd-coredump is installed, - ENABLE_COREDUMP but systemd-coredump is not installed, - !ENABLE_COREDUMP. In the last case we would not do any coredumping-related setup in pid1, which means that coredumps would go to to the working directory of the process, but actually limits are set to 0. This is inherited by children of pid1. As discussed extensively in https://github.com/systemd/systemd/pull/26607, this default is bad: dumps are written to arbitrary directories and not cleaned up. Nevertheless, the kernel cannot really fix it. It doesn't know where to write, and it doesn't know when that place would become available. It is only the userspace that can tell this to the kernel. So the only sensible change in the kernel would be to default to '|/bin/false', i.e. do what we do now. In the middle case, we disabled writing of coredumps via a pattern, but raise the RLIMIT_CORE. We need to raise the limit because we can't raise it later after processes have been forked off. This means we behave correctly, but allow coredumping to be enabled at a later point without a reboot. This patch makes the last case behave like the middle case. This means that even if systemd is compiled with systemd-coredump, it still does the usual setup. If users want to restore the kernel default, they need to provide two drop-in files: for sysctl.d, with 'kernel.core_pattern=core' for systemd.conf, with 'DefaultLimitCORE=0'. The general idea is that pid1 does the safe thing. A distro may want to use something different than the systemd-coredump machinery, and then that would could packaged together with the drop-ins to change the configuration. Alternative-for: #26607
* core/main: restore the correct assert about array positionZbigniew Jędrzejewski-Szmek2023-03-261-1/+1
| | | | | | | | | | | 'pos' is incremented after each assignment. If we use the maximum number of arguments, we end up with pos==9 after all the assignments, and it points to where the next value would be assigned. This position must remain NULL. The assert I "fixed" was intentionally introduced in 26abdc73a212b90f7c4b71808a1028d2e87ab09f as a bugfix. So my "fix" repeated the same error that was fixed back then.
* core/main: fix setting of arguments for shutdownZbigniew Jędrzejewski-Szmek2023-03-241-15/+22
| | | | | | | | | | | | | | | | | | | | | | Fixup for d2ebd50d7f9740dcf30e84efc75610af173967d2 and 6920049fad4fa39db5fec712f82f7f75b98fd4b9: - add a comment that the last arg must be NULL and adjust the assert. - move initialization around so that fields are declared, initialized, and consumed in the same order. - move declaration of pos adjacent do declaration of command_line. This makes it easy to see that it was not initialized correctly. - initialize buffers before writing the pointer into the args array. This makes no difference for the compiler, but it just feels "wrong" to do it in opposite order. Because pos was off, we would ignore args after the timeout, and also overwrite the buffer if enough args were used. I think this is case shows clearly that declaring all variables at the top of the function, with some initialized and other not, is very error-prone. The compiler has no issue with declaring variables whereever, and we should take advantage of this to make it keep declaration, initialization, and use close. (Within reason of course.)
* core/main: make positional arguments followed by '=', then by valueYu Watanabe2023-03-241-18/+15
| | | | | | | To make ConditionKernelCommandLine= or friend not confused when we are running in a container. Addresses https://github.com/systemd/systemd/pull/26887#discussion_r1143358884.
* core/main: fix maximum number of arguments for shutdown commandYu Watanabe2023-03-241-1/+1
| | | | Follow-up for c5673ed0de3bec38f68d8113d253842b47766e27.
* conf: replace config_parse_many_nulstr() with config_parse_config_file()Franck Bui2023-03-141-17/+16
| | | | | | | | | | | | | | | | | All daemons use a similar scheme to read their main config files and theirs drop-ins. The main config files are always stored in /etc/systemd directory and it's easy enough to construct the name of the drop-in directories based on the name of the main config file. Hence the new helper does that internally, which allows to reduce and simplify the args passed previously to config_parse_many_nulstr(). Besides the overall code simplification it results: 16 files changed, 87 insertions(+), 159 deletions(-) it allows to identify clearly the locations in the code where configuration files are parsed.
* runtime-scope: add helper that turns RuntimeScope enum into --system/--user ↵Lennart Poettering2023-03-101-1/+1
| | | | string
* basic: add RuntimeScope enumLennart Poettering2023-03-101-66/+97
| | | | | | | | | | | | In various tools and services we have a per-system and per-user concept. So far we sometimes used a boolean indicating whether we are in system mode, or a reversed boolean indicating whether we are in user mode, or the LookupScope enum used by the lookup path logic. Let's address that, in introduce a common enum for this, we can use all across the board. This is mostly just search/replace, no actual code changes.
* load-fragment: add user credential specifiers to user.confRonan Pigott2023-03-101-2/+2
| | | | | | This enables the ManagerEnvironment= settings in the user's user.conf to reference some user data like $HOME for the purpose of setting environment variables derived from these values.
* core: add missing MemoryPressureWatch= and MemoryPressureThresholdSec= settingYu Watanabe2023-03-091-1/+1
| | | | | | Follow-up for #26393. Addresses https://github.com/systemd/systemd/pull/26393#issuecomment-1458655798.
* core: log message when reloading finishesLuca Boccassi2023-03-081-1/+4
| | | | | | Reloading might be slow, especially when under memory pressure, and watchdogs might be triggered. It is useful to have timestamped telemetry in the journal to see how long a reload takes.
* pid1: add unit file settings to control memory pressure logicLennart Poettering2023-03-011-0/+9
|
* meson: merge our two valgrind configuration conditions into oneZbigniew Jędrzejewski-Szmek2023-02-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Most of the support for valgrind was under HAVE_VALGRIND_VALGRIND_H, i.e. we would enable if the valgrind headers were found. The operations then we be conditionalized on RUNNING_UNDER_VALGRIND. But in a few places we had code which was conditionalized on VALGRIND, i.e. the config option. I noticed because I compiled with -Dvalgrind=true on a machine that didn't have valgrind.h, and the build failed because RUNNING_UNDER_VALGRIND was not defined. My first idea was to add a check that the header is present if the option is set, but it seems better to just remove the option. The code to support valgrind is trivial, and if we're !RUNNING_UNDER_VALGRIND, it has negligible cost. And the case of running under valgrind is always some special testing/debugging mode, so we should just do those extra steps to make valgrind output cleaner. Removing the option makes things simpler and we don't have to think if something should be covered by the one or the other configuration bit. I had a vague recollection that in some places we used -Dvalgrind=true not for valgrind support, but to enable additional cleanup under other sanitizers. But that code would fail to build without the valgrind headers anyway, so I'm not sure if that was still used. If there are uses like that, we can extend the condition for cleanup_pools().
* capability-util: add CAP_MASK_ALL + CAP_MASK_UNSET macrosLennart Poettering2023-02-201-1/+1
| | | | | | | | | We should be more careful with distinguishing the cases "all bits set in caps mask" from "cap mask invalid". We so far mostly used UINT64_MAX for both, which is not correct though (as it would mean AmbientCapabilities=~0 followed by AmbientCapabilities=0) would result in capability 63 to be set (which we don't really allow, since that means unset).
* log: add common helper log_set_target_and_open()Lennart Poettering2023-02-161-6/+3
| | | | | quite often we want to set a log target and immediately open it. Add a common helper for that.
* pid1: generate compat warning for SystemCallArchitectures= if seccomp is offLennart Poettering2023-02-161-0/+3
|
* core: split system/user job timeouts and make them configurableZbigniew Jędrzejewski-Szmek2023-02-011-4/+4
| | | | | | | | | | | | | | | | Config options are -Ddefault-timeout-sec= and -Ddefault-user-timeout-sec=. Existing -Dupdate-helper-user-timeout= is renamed to -Dupdate-helper-user-timeout-sec= for consistency. All three options take an integer value in seconds. The renaming and type-change of the option is a small compat break, but it's just at compile time and result in a clear error message. I also doubt that anyone was actually using the option. This commit separates the user manager timeouts, but keeps them unchanged at 90 s. The timeout for the user manager is set to 4/3*user-timeout, which means that it is still 120 s. Fedora wants to experiment with lower timeouts, but doing this via a patch would be annoying and more work than necessary. Let's make this easy to configure.
* os-util: optionally, return EOL time in os_release_support_ended()Lennart Poettering2023-01-241-1/+1
|
* tree-wide: unify how we pick OS pretty name to displayLennart Poettering2023-01-241-1/+1
|
* pid1: make sure we send our calling service manager RELOADING=1 when reloadingLennart Poettering2023-01-101-0/+10
| | | | | | | | And send READY=1 again when we are done with it. We do this not only for "daemon-reload" but also for "daemon-reexec" and "switch-root", since from the perspective of an encapsulating service manager these three operations are not that different.
* load-fragment: config_parse_emergency_action() doesn't ever get a Manager ↵Lennart Poettering2023-01-061-1/+1
| | | | | | | | | | | | | | | | pointer passed in In 'data' we get the location passed in we write stuff, and that's not the Manager object. And we neither get the Manager passed in via 'userdata', because at the time we parse the emergency action for the manager the Manager is not actually allocated yet. hence, let's fix this differently, and pass in the user/system mode descriptor via the 'ltype' argument. Fixes: #25933
* manager: perform objective->shutdown_verb mapping locallyVito Caputo2023-01-021-26/+23
| | | | | | | | | | | | | | This is a small cleanup removing the need for the spurious *ret_shutdown_verb argument on invoke_main_loop() while moving the MANAGER_OBJECTIVE::shutdown_verb string mapping local to where it actually gets added to the shutdown argv in become_shutdown(). This also eliminates the need for the several clearings of *ret_shutdown_argv, and the streq() branches of shutdown_verb in favor of plain equality tests against the objective value. Nothing functionally has been changed.
* tree-wide: use -EBADF for fd initializationZbigniew Jędrzejewski-Szmek2022-12-191-1/+1
| | | | | | | | | | | | | | | | -1 was used everywhere, but -EBADF or -EBADFD started being used in various places. Let's make things consistent in the new style. Note that there are two candidates: EBADF 9 Bad file descriptor EBADFD 77 File descriptor in bad state Since we're initializating the fd, we're just assigning a value that means "no fd yet", so it's just a bad file descriptor, and the first errno fits better. If instead we had a valid file descriptor that became invalid because of some operation or state change, the other errno would fit better. In some places, initialization is dropped if unnecessary.
* Merge pull request #25723 from keszybz/generators-tmpYu Watanabe2022-12-151-1/+1
|\ | | | | Run generators with / ro and /tmp mounted
| * treewide: drop "RUN_" from "RUN_WITH_UMASK"Zbigniew Jędrzejewski-Szmek2022-12-131-1/+1
| | | | | | | | | | | | RUN_WITH_UMASK was initially conceived for spawning externals progs with the umask set. But nowadays we use it various syscalls and stuff that doesn't "run" anything, so the "RUN_" prefix has outlived its usefulness.
* | manager: add option to rate limit daemon-reloadLuca Boccassi2022-12-131-0/+33
|/ | | | | | Reloading is a heavy-weight operation, and currently it is not possible to stop an orchestrator from spamming reload requests. Add configuration options to allow rate-limiting.
* manager: write net/unix/max_dgram_qlen sysctl as fixed stringZbigniew Jędrzejewski-Szmek2022-12-031-2/+1
|
* manager: define a string constant for LONG_MAX and use that for sysctlZbigniew Jędrzejewski-Szmek2022-12-031-1/+1
| | | | | This moves the formatting of the constant to compilation time and let's us avoid asprintf() in the very hot path of initial boot.
* manager: do not append '\n' when writing sysctl settingsZbigniew Jędrzejewski-Szmek2022-12-031-3/+4
| | | | | | | | | | | | | | | | | | When booting with debug logs, we print: Setting '/proc/sys/fs/file-max' to '9223372036854775807 ' Setting '/proc/sys/fs/nr_open' to '2147483640 ' Couldn't write fs.nr_open as 2147483640, halving it. Setting '/proc/sys/fs/nr_open' to '1073741816 ' Successfully bumped fs.nr_open to 1073741816 The strange formatting is because we explicitly appended a newline in those two places. It seems that the kernel doesn't care. In fact, we have a few dozen other writes to sysctl where we don't append a newline. So let's just drop those here too, to make the code a bit simpler and avoid strange output in the logs.
* boot: implement kernel EFI RNG seed protocol with proper hashingJason A. Donenfeld2022-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than passing seeds up to userspace via EFI variables, pass seeds directly to the kernel's EFI stub loader, via LINUX_EFI_RANDOM_SEED_TABLE_GUID. EFI variables can potentially leak and suffer from forward secrecy issues, and processing these with userspace means that they are initialized much too late in boot to be useful. In contrast, LINUX_EFI_RANDOM_SEED_TABLE_GUID uses EFI configuration tables, and so is hidden from userspace entirely, and is parsed extremely early on by the kernel, so that every single call to get_random_bytes() by the kernel is seeded. In order to do this properly, we use a bit more robust hashing scheme, and make sure that each input is properly memzeroed out after use. The scheme is: key = HASH(LABEL || sizeof(input1) || input1 || ... || sizeof(inputN) || inputN) new_disk_seed = HASH(key || 0) seed_for_linux = HASH(key || 1) The various inputs are: - LINUX_EFI_RANDOM_SEED_TABLE_GUID from prior bootloaders - 256 bits of seed from EFI's RNG - The (immutable) system token, from its EFI variable - The prior on-disk seed - The UEFI monotonic counter - A timestamp This also adjusts the secure boot semantics, so that the operation is only aborted if it's not possible to get random bytes from EFI's RNG or a prior boot stage. With the proper hashing scheme, this should make boot seeds safe even on secure boot. There is currently a bug in Linux's EFI stub in which if the EFI stub manages to generate random bytes on its own using EFI's RNG, it will ignore what the bootloader passes. That's annoying, but it means that either way, via systemd-boot or via EFI stub's mechanism, the RNG *does* get initialized in a good safe way. And this bug is now fixed in the efi.git tree, and will hopefully be backported to older kernels. As the kernel recommends, the resultant seeds are 256 bits and are allocated using pool memory of type EfiACPIReclaimMemory, so that it gets freed at the right moment in boot.
* Rename def.h to constants.hZbigniew Jędrzejewski-Szmek2022-11-081-1/+1
| | | | | | The name "def.h" originates from before the rule of "no needless abbreviations" was established. Let's rename the file to clarify that it contains a collection of various semi-related constants.
* basic: move a bunch of cmdline-related funcs to new argv-util.c+hZbigniew Jędrzejewski-Szmek2022-11-081-0/+1
| | | | | | | | | | | | | I wanted to move saved_arg[cv] to process-util.c+h, but this causes problems: process-util.h includes format-util.h which includes net/if.h, which conflicts with linux/if.h. So we can't include process-util.h in some files. But process-util.c is very long anyway, so it seems nice to create a new file. rename_process(), invoked_as(), invoked_by_systemd(), and argv_looks_like_help() which lived in process-util.c refer to saved_argc and saved_argv, so it seems reasonable to move them to the new file too. util.c is now empty, so it is removed. util.h remains.
* basic: create new basic/initrd-util.[ch] for initrd-related functionsZbigniew Jędrzejewski-Szmek2022-11-081-1/+1
| | | | | | | | | I changed imports of util.h to initrd-util.h, or added an import of initrd-util.h, to keep compilation working. It turns out that many files didn't import util.h directly. When viewing the patch, don't be confused by git rename detection logic: a new .c file is added and two functions moved into it.