summaryrefslogtreecommitdiff
path: root/src/core/load-fragment.h
Commit message (Collapse)AuthorAgeFilesLines
* core: add DelegateSubgroup= settingLennart Poettering2023-04-271-0/+1
| | | | | | | | | | | | | | | This implements a minimal subset of #24961, but in a lot more restrictive way: we only allow one level of subcgroup (as that's enough to address the no-processes in inner cgroups rule), and does not change anything about threaded cgroup logic or similar, or make any of this new behaviour mandatory. All this does is this: all non-control processes we invoke for a unit we'll invoke in a subgroup by the specified name. We'll later port all our current services that use cgroup delegation over to this, i.e. user@.service, systemd-nspawn@.service and systemd-udevd.service.
* image-policy: introduce parse_image_policy_argument() helperYu Watanabe2023-04-131-1/+0
| | | | | | | | | Addresses https://github.com/systemd/systemd/pull/25608/commits/84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1060130312, https://github.com/systemd/systemd/pull/25608/commits/84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1067927293, and https://github.com/systemd/systemd/pull/25608/commits/84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1067926416. Follow-up for 84be0c710d9d562f6d2cf986cc2a8ff4c98a138b.
* service: add ability to pin fd storeLennart Poettering2023-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Oftentimes it is useful to allow the per-service fd store to survive longer than for a restart. This is useful in various scenarios: 1. An fd to some security relevant object needs to be stashed somewhere, that should not be cleaned automatically, because the security enforcement would be dropped then. 2. A user namespace fd should be allocated on first invocation and be kept around until the user logs out (i.e. systemd --user ends), á la #16328 (This does not implement what #16318 asks for, but should solve the use-case discussed there.) 3. There's interest in allow a concept of "userspace reboots" where the kernel stays running, and userspace is swapped out (i.e. all services exit, and the rootfs transitioned into a new version of it) while keeping some select resources pinned, very similar to how we implement a switch root. Thus it is useful to allow services to exit, while leaving their fds around till the very end. This is exposed through a new FileDescriptorStorePreserve= setting that is closely modelled after RuntimeDirectoryPreserve= (in fact it reused the same internal type), since we want similar behaviour in the end, and quite often they probably want to be used together.
* tree-wide: hook up image dissection policy logic everywhereLennart Poettering2023-04-051-0/+1
|
* core: rename "mount_flags" → "mount_propagation_flag" internally where ↵Lennart Poettering2023-03-141-1/+1
| | | | | | | | | | | | | | | | | | appropriate ExecContext has a field that controls the mount propagation flag of the mounts in the resulting namespace. This is exposed as "MountFlags=" which is super confusing, as it suggests one could control more than propagation, and that it was actually a flags field. It's an enum though only, and nothing else. We might want to rename this externally one day, but given the compat kludges this requires and the fact this is somewhat nichey it might not be worth it. But internally let's rename it, as it makes things much easier to grok, in particular as part of the codebase already exposed the concept as mount_propagation_flag. No actual code flow changes, just some renaming.
* core: add missing MemoryPressureWatch= and MemoryPressureThresholdSec= settingYu Watanabe2023-03-091-1/+1
| | | | | | Follow-up for #26393. Addresses https://github.com/systemd/systemd/pull/26393#issuecomment-1458655798.
* pid1: add unit file settings to control memory pressure logicLennart Poettering2023-03-011-0/+1
|
* core: add OpenFile settingRichard Phibel2023-01-101-0/+1
|
* journal: log filtering options support in PID1Quentin Deslandes2022-12-151-0/+1
| | | | | | | | | Define new unit parameter (LogFilterPatterns) to filter logs processed by journald. This option is used to store a regular expression which is carried from PID1 to systemd-journald through a cgroup xattrs: `user.journald_log_filter_patterns`.
* core/load-fragment: move config_parse_sec_fix_0 to src/sharedMichal Sekletar2022-08-231-1/+0
|
* core/cgroup: CPUWeight/CPUShares support idle inputwineway2022-08-111-0/+1
| | | | Signed-off-by: wineway <wangyuweihx@gmail.com>
* Revert NFTSet featureYu Watanabe2022-06-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | This reverts PR #22587 and its follow-up commit. More specifically, 2299b1cae32c1fb8911da0ce26efced68032f4f8 (partially), e176f855278d5098d3fecc5aa24ba702147d42e0, ceb46a31a01b3d3d1d6095d857e29ea214a2776b, and 51bb9076ab8c050bebb64db5035852385accda35. The PR was merged without final approval, and has several issues: - OSS fuzz reported issues in the conf parser, - It calls synchrnous netlink call, it should not be especially in PID1, - The importance of NFTSet for CGroup and DynamicUser may be questionable, at least, there was no justification PID1 should support it. - For networkd, it should be implemented with Request object, - There is no test for the feature. Fixes #23711. Fixes #23717. Fixes #23719. Fixes #23720. Fixes #23721. Fixes #23759.
* core: firewall integration with DynamicUserNFTSet=Topi Miettinen2022-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New directive `DynamicUserNFTSet=` provides a method for integrating configuration of dynamic users into firewall rules with NFT sets. Example: ``` table inet filter { set u { typeof meta skuid } chain service_output { meta skuid != @u drop accept } } ``` ``` /etc/systemd/system/dunft.service [Service] DynamicUser=yes DynamicUserNFTSet=inet:filter:u ExecStart=/bin/sleep 1000 [Install] WantedBy=multi-user.target ``` ``` $ sudo nft list set inet filter u table inet filter { set u { typeof meta skuid elements = { 64864 } } } $ ps -n --format user,group,pid,command -p `pgrep sleep` USER GROUP PID COMMAND 64864 64864 55158 /bin/sleep 1000 ```
* core: firewall integration with ControlGroupNFTSet=Topi Miettinen2022-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New directive `ControlGroupNFTSet=` provides a method for integrating services into firewall rules with NFT sets. Example: ``` table inet filter { ... set timesyncd { type cgroupsv2 } chain ntp_output { socket cgroupv2 != @timesyncd counter drop accept } ... } ``` /etc/systemd/system/systemd-timesyncd.service.d/override.conf ``` [Service] ControlGroupNFTSet=inet:filter:timesyncd ``` ``` $ sudo nft list set inet filter timesyncd table inet filter { set timesyncd { type cgroupsv2 elements = { "system.slice/systemd-timesyncd.service" } } } ```
* Merge pull request #20813 from unusual-thoughts/exittype_v2Zbigniew Jędrzejewski-Szmek2021-11-081-0/+1
|\ | | | | Reintroduce ExitType
| * Reintroduce ExitTypeHenri Chain2021-11-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces `ExitType=main|cgroup` for services. Similar to how `Type` specifies the launch of a service, `ExitType` is concerned with how systemd determines that a service exited. - If set to `main` (the current behavior), the service manager will consider the unit stopped when the main process exits. - The `cgroup` exit type is meant for applications whose forking model is not known ahead of time and which might not have a specific main process. The service will stay running as long as at least one process in the cgroup is running. This is intended for transient or automatically generated services, such as graphical applications inside of a desktop environment. Motivation for this is #16805. The original PR (#18782) was reverted (#20073) after realizing that the exit status of "the last process in the cgroup" can't reliably be known (#19385) This version instead uses the main process exit status if there is one and just listens to the cgroup empty event otherwise. The advantages of a service with `ExitType=cgroup` over scopes are: - Integrated logging / stdout redirection - Avoids the race / synchronisation issue between launch and scope creation - More extensive use of drop-ins and thus distro-level configuration: by moving from scopes to services we can have drop ins that will affect properties that can only be set during service creation, like `OOMPolicy` and security-related properties - It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously only services can be used in the generator, not scopes. [1] https://bugs.kde.org/show_bug.cgi?id=433299
* | exec: Add TTYRows and TTYColumns properties to set TTY dimensionsDaan De Meyer2021-11-051-0/+1
|/
* core: Try to prevent infinite recursive template instantiationDaan De Meyer2021-10-281-0/+4
| | | | | | | | | | | | | | | | | To prevent situations like in #17602 from happening, let's drop direct recursive template dependencies. These will almost certainly lead to infinite recursion so let's drop them immediately to avoid instantiating potentially thousands of irrelevant units. Example of a template that would lead to infinite recursion which is caught by this check: notify@.service: ``` [Unit] Wants=notify@%n.service ```
* Merge pull request #20787 from fbuihuu/watchdog-more-reworkLennart Poettering2021-10-131-0/+1
|\ | | | | Watchdog more rework
| * watchdog: rename special string "infinity" taken by the watchdog timeout ↵Franck Bui2021-10-131-0/+1
| | | | | | | | options to "default"
* | core: add RestrictFileSystems= fragment parserIago López Galeiras2021-10-061-0/+1
|/ | | | | It takes an allow or deny list of filesystems services should have access to.
* core: Add ExecSearchPath parameter to specify the directory relative to ↵alexlzhu2021-09-281-0/+1
| | | | | | | | | | | | | which binaries executed by Exec*= should be found Currently there does not exist a way to specify a path relative to which all binaries executed by Exec should be found. The only way is to specify the absolute path. This change implements the functionality to specify a path relative to which binaries executed by Exec*= can be found. Closes #6308
* cgroup: add support for StartupAllowedCPUs and StartupAllowedMemoryNodesPeter Morrow2021-09-151-2/+1
| | | | | | | Add new settings which can be used to control cpuset based cpu affinity during the startup phase only. Signed-off-by: Peter Morrow <pemorrow@linux.microsoft.com>
* core: add load fragment implementation for RestrictNetworkInterfaces=Mauricio Vásquez2021-08-181-0/+1
| | | | Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
* Revert "Introduce ExitType"Zbigniew Jędrzejewski-Szmek2021-06-301-1/+0
| | | | | | | | | | | This reverts commit cb0e818f7cc2499d81ef143e5acaa00c6e684711. After this was merged, some design and implementation issues were discovered, see the discussion in #18782 and #19385. They certainly can be fixed, but so far nobody has stepped up, and we're nearing a release. Hopefully, this feature can be merged again after a rework. Fixes #19345.
* core: add SocketBind{Allow|Deny} fragment parserJulia Kartseva2021-04-261-0/+1
|
* core: add bpf-foreign to fragment parserJulia Kartseva2021-04-091-0/+1
| | | | | | - Parse a string for bpf attach type - Simplify bpffs path - Add foreign bpf program to cgroup context
* Introduce ExitTypeHenri Chain2021-03-311-0/+1
|
* Add ExtensionImages directive to form overlaysLuca Boccassi2021-02-231-0/+1
| | | | | Add support for overlaying images for services on top of their root fs, using a read-only overlay.
* oom: add unit file settings for oomd avoid/omit xattrsAnita Zhang2021-02-121-0/+1
|
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* core: add Timestamping= option for socket unitsLennart Poettering2020-10-271-0/+1
| | | | | | | | | | | This adds a way to control SO_TIMESTAMP/SO_TIMESTAMPNS socket options for sockets PID 1 binds to. This is useful in journald so that we get proper timestamps even for ingress log messages that are submitted before journald is running. We recently turned on packet info metadata from PID 1 for these sockets, but the timestamping info was still missing. Let's correct that.
* core: add ManagedOOM*= properties to configure systemd-oomd on the unitAnita Zhang2020-10-071-0/+2
| | | | | This adds the hook ups so it can be read with the usual systemd utilities. Used in later commits by sytemd-oomd.
* core: remember when we set ExecContext.mount_apivfsZbigniew Jędrzejewski-Szmek2020-09-241-0/+1
| | | | No functional change intended so far.
* exec: SystemCallLog= directiveTopi Miettinen2020-09-151-0/+1
| | | | | | | | | | | | With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp
* core: add credentials logicLennart Poettering2020-08-251-0/+2
| | | | Fixes: #15778 #16060
* core: introduce ProtectProc= and ProcSubset= to expose hidepid= and subset= ↵Lennart Poettering2020-08-241-0/+2
| | | | | | | | | | | procfs mount options Kernel 5.8 gained a hidepid= implementation that is truly per procfs, which allows us to mount a distinct once into every unit, with individual hidepid= settings. Let's expose this via two new settings: ProtectProc= (wrapping hidpid=) and ProcSubset= (wrapping subset=). Replaces: #11670
* core: remove support for ConditionNull=Lennart Poettering2020-08-201-1/+0
| | | | | | | | | | The concept is flawed, and mostly useless. Let's finally remove it. It has been deprecated since 90a2ec10f2d43a8530aae856013518eb567c4039 (6 years ago) and we started to warn since 55dadc5c57ef1379dbc984938d124508a454be55 (1.5 years ago). Let's get rid of it altogether.
* core: new feature MountImagesLuca Boccassi2020-08-051-0/+1
| | | | | | | | | | | | | Follows the same pattern and features as RootImage, but allows an arbitrary mount point under / to be specified by the user, and multiple values - like BindPaths. Original implementation by @topimiettinen at: https://github.com/systemd/systemd/pull/14451 Reworked to use dissect's logic instead of bare libmount() calls and other review comments. Thanks Topi for the initial work to come up with and implement this useful feature.
* service: add new RootImageOptions featureLuca Boccassi2020-07-291-0/+1
| | | | | | | | | | Allows to specify mount options for RootImage. In case of multi-partition images, the partition number can be prefixed followed by colon. Eg: RootImageOptions=1:ro,dev 2:nosuid nodev In absence of a partition number, 0 is assumed.
* core: add RootHashSignature service parameterLuca Boccassi2020-06-251-0/+1
| | | | | Allow to explicitly pass root hash signature as a unit option. Takes precedence over implicit checks.
* core: add RootHash and RootVerity service parametersLuca Boccassi2020-06-231-0/+1
| | | | | Allow to explicitly pass root hash (explicitly or as a file) and verity device/file as unit options. Take precedence over implicit checks.
* core: let user define start-/stop-timeout behaviourJan Klötzke2020-06-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.
* manager: add CoredumpFilter= settingZbigniew Jędrzejewski-Szmek2020-04-091-0/+1
| | | | Fixes #6685.
* core: add new LogNamespace= execution settingLennart Poettering2020-01-311-0/+1
|
* core: initialize priority_set when parsing swap unit filesLennart Poettering2020-01-091-0/+1
| | | | Fixes: #14524
* pid1: fix the names of AllowedCPUs= and AllowedMemoryNodes=Zbigniew Jędrzejewski-Szmek2019-11-251-2/+2
| | | | | | | | | | | The original PR was submitted with CPUSetCpus and CPUSetMems, which was later changed to AllowedCPUs and AllowedMemmoryNodes everywhere (including the parser used by systemd-run), but not in the parser for unit files. Since we already released -rc1, let's keep support for the old names. I think we can remove it in a release or two if anyone remembers to do that. Fixes #14126. Follow-up for 047f5d63d7a1ab75073f8485e2f9b550d25b0772.
* core: remove unused prototypesZbigniew Jędrzejewski-Szmek2019-10-011-2/+0
|
* cgroup: introduce support for cgroup v2 CPUSET controllerPavel Hrdina2019-09-241-0/+2
| | | | | | | | | | | | | | Introduce support for configuring cpus and mems for processes using cgroup v2 CPUSET controller. This allows users to limit which cpus and memory NUMA nodes can be used by processes to better utilize system resources. The cgroup v2 interfaces to control it are cpuset.cpus and cpuset.mems where the requested configuration is written. However, it doesn't mean that the requested configuration will be actually used as parent cgroup may limit the cpus or mems as well. In order to reflect the real configuration cgroup v2 provides read-only files cpuset.cpus.effective and cpuset.mems.effective which are exported to users as well.
* core: Fix setting StatusUnitFormat from config filesMaciej Stanczew2019-09-171-0/+1
|