| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Let's clean up validation/escaping of cgroup names. i.e. split out code
that tests if name needs escaping. Return proper error codes, and extend
test a bit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
IN C23, thread_local is a reserved keyword and we shall therefore
do nothing to redefine it. glibc has it defined for older standard
version with the right conditions.
v2 by Yu Watanabe:
Move the definition to missing_threads.h like the way we define e.g.
missing syscalls or missing definitions, and include it by the users.
Co-authored-by: Yu Watanabe <watanabe.yu+github@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
From a given cgroup path, cg_path_get_unit() allows to retrieve the
unit's name. Although, this removes the path to the unit's cgroup,
preventing the result to be used to fetch xattrs.
Introduce cg_path_get_unit_path() which provides the path to the unit's
cgroup. This function behave similarly to cg_path_get_unit() (checking
the validity and escaping the unit's name).
|
|
|
|
|
|
| |
../src/basic/cgroup-util.c: In function ‘skip_session’:
../src/basic/cgroup-util.c:1241:32: error: incompatible types when returning type ‘_Bool’ but ‘const char *’ was expected
1241 | return false;
|
|
|
|
|
|
| |
The name "def.h" originates from before the rule of "no needless abbreviations"
was established. Let's rename the file to clarify that it contains a collection
of various semi-related constants.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIGKILL but processes still remain
After sending a SIGKILL to a process, the process might disappear from
`cgroup.threads` but still show up in `cgroup.procs` and still remains in the
cgroup and cause migrating new processes to `Delegate=yes` cgroups to fail with
`-EBUSY`. This is especially likely for heavyweight processes that consume more
kernel CPU time to clean up.
Fix this by only returning 0 when both `cgroup.threads` and
`cgroup.procs` are empty.
|
| |
|
|
|
|
| |
Fixes CID#1322378.
|
|
|
|
|
| |
Rename return parameters to "ret", use ternary op without second
argument, rebreak comments, use isempty() more.
|
|
|
|
|
|
|
|
|
| |
The variable is not useful outside of the loop (it'll always be null
after the loop is finished), so we can declare it inline in the loop.
This saves one variable declaration and reduces the chances that somebody
tries to use the variable outside of the loop.
For consistency, 'de' is used everywhere for the var name.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's define two helpers strdupa_safe() + strndupa_safe() which do the
same as their non-safe counterparts, except that they abort if called
with allocations larger than ALLOCA_MAX.
This should ensure that all our alloca() based allocations are subject
to this limit.
afaics glibc offers three alloca() based APIs: alloca() itself,
strndupa() + strdupa(). With this we have now replacements for all of
them, that take the limit into account.
|
|\
| |
| | |
basic: add recurse_dir() function as modern replacement for nftw()
|
| | |
|
| |
| |
| |
| | |
That way we can easily call name_to_handle_at() on cgroupfs2 elsewhere.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
getxattr_at_malloc() + listxattr_at_malloc()
Unfortunately fgetxattr() and flistxattr() don't work via O_PATH fds.
Let's thus add fallbacks to go via /proc/self/fd/ in these cases.
Also, let's merge all the various flavours we have here into singular
implementations that can do everything we need:
1. malloc() loop handling
2. by fd, by path, or combination (i.e. a proper openat() like API)
3. work on O_PATH
|
|
|
|
| |
It returns the cgroupID from a cgroup path.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This commit introduces all the logic to load and attach the BPF
programs to restrict network interfaces when a unit specifying it is
loaded.
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We recently started making more use of malloc_usable_size() and rely on
it (see the string_erase() story). Given that we don't really support
sytems where malloc_usable_size() cannot be trusted beyond statistics
anyway, let's go fully in and rework GREEDY_REALLOC() on top of it:
instead of passing around and maintaining the currenly allocated size
everywhere, let's just derive it automatically from
malloc_usable_size().
I am mostly after this for the simplicity this brings. It also brings
minor efficiency improvements I guess, but things become so much nicer
to look at if we can avoid these allocation size variables everywhere.
Note that the malloc_usable_size() man page says relying on it wasn't
"good programming practice", but I think it does this for reasons that
don't apply here: the greedy realloc logic specifically doesn't rely on
the returned extra size, beyond the fact that it is equal or larger than
what was requested.
(This commit was supposed to be a quick patch btw, but apparently we use
the greedy realloc stuff quite a bit across the codebase, so this ends
up touching *a*lot* of code.)
|
|
|
|
| |
Standard cgroup harness for bpf feature.
|
|
|
|
|
| |
Add CGROUP_MASK_BPF_FOREIGN to CGROUP_MASK_BPF and standard cgroup
context harness.
|
|
|
|
|
|
|
|
| |
This should make it easier to remove those warnings when the compiler
gets smarter. Not sure if I got them all...
Double space before the comment start to make it easier to separate from the
preceding line.
|
| |
|
|
|
|
|
|
|
| |
Wherever we read virtual files we better should use
read_full_virtual_file(), to make sure we get a consistent response
given how weird the kernel's handling with partial read on such file
systems is.
|
|
|
|
|
|
|
|
| |
When the test suite is being run in a foreign environment,
/sys/fs/cgroup might not be set up in a way that we recognize.
Returning ENOMEDIUM causes the tests to be skipped in this case.
Bug: https://bugs.gentoo.org/771819
|
|
|
|
| |
Follow-up for 0fa7b50053.
|
|\
| |
| | |
Make (user) instance aware of delegated cgroup controllers
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
systemd user instance assumed same controllers are available to it as to
PID 1. That is not true generally, in v1 (legacy, hybrid) we don't delegate any
controllers to anyone and in v2 (unified) we may delegate only subset of
controllers.
The user instance would fail silently when the controller cgroup cannot
be created or the controller cannot be enabled on the unified hierarchy.
The changes in 7b63961415 ("cgroup: Swap cgroup v1 deletion and
migration") caused some attempts of operating on non-delegated
controllers to be logged.
Make the user instance first check what controllers are availble to it
and narrow operations only to these controllers. The original checks are
kept in place.
Note that daemon-reexec needs to be invoked in order to update the set
of unabled controllers after a change.
Fixes: #18047
Fixes: #17862
|
| |
| |
| |
| |
| |
| | |
The function controller_is_accessible() doesn't do really much in case
of the unified hierarchy. Move common parts into cg_get_path_and_check
and make controller check v1 specific. This is refactoring only.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes two checks where we compare string sizes when validating with
FILENAME_MAX. In both cases the check apparently wants to check if the
name fits in a filename, but that's not actually what FILENAME_MAX can
be used for, as it — in contrast to what the name suggests — actually
encodes the maximum length of a path.
In both cases the stricter change doesn't actually change much, but the
use of FILENAME_MAX is still misleading and typically wrong.
|
| | |
|
|\ \
| |/
|/| |
oomd: implement avoid/omit support for cgroups
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There may be situations where a cgroup should be protected from killing
or deprioritized as a candidate. In FB oomd xattrs are used to bias oomd
away from supervisor cgroups and towards worker cgroups in container
tasks. On desktops this can be used to protect important units with
unpredictable resource consumption.
The patch allows systemd-oomd to understand 2 xattrs:
"user.oomd_avoid" and "user.oomd_omit". If systemd-oomd sees these
xattrs set to 1 on a candidate cgroup (i.e. while attempting to kill something)
AND the cgroup is owned by root, it will either deprioritize the cgroup as
a candidate (avoid) or remove it completely as a candidate (omit).
Usage is restricted to root owned cgroups to prevent situations where an
unprivileged user can set their own cgroups lower in the kill priority than
another user's (and prevent them from omitting their units from
systemd-oomd killing).
|
| | |
|
|/ |
|
|
|
|
| |
Also use _cleanup_free_ in one more place.
|
|
|
|
|
| |
The trailing NULL in the argument list is now implied (similar to
what we already have in place in strjoin()).
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This adds the hook ups so it can be read with the usual systemd
utilities. Used in later commits by sytemd-oomd.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
|
|
|
|
|
|
| |
Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.
|
|
|
|
|
|
|
|
|
|
|
| |
When nothing at all is mounted at /sys/fs/cgroup, the fs.f_type is
SYSFS_MAGIC (0x62656572) which results in the confusing debug log:
"Unknown filesystem type 62656572 mounted on /sys/fs/cgroup."
Instead, if the f_type is SYSFS_MAGIC, a more accurate message is:
"No filesystem is currently mounted on /sys/fs/cgroup."
|