core: rework how we track which PIDs to watch for a unit

Previously, we'd maintain two hashmaps keyed by PIDs, pointing to Unit interested in SIGCHLD events for them. This scheme allowed a specific PID to be watched by exactly 0, 1 or 2 units. With this rework this is replaced by a single hashmap which is primarily keyed by the PID and points to a Unit interested in it. However, it optionally also keyed by the negated PID, in which case it points to a NULL terminated array of additional Unit objects also interested. This scheme means arbitrary numbers of Units may now watch the same PID. Runtime and memory behaviour should not be impact by this change, as for the common case (i.e. each PID only watched by a single unit) behaviour stays the same, but for the uncommon case (a PID watched by more than one unit) we only pay with a single additional memory allocation for the array. Why this all? Primarily, because allowing exactly two units to watch a specific PID is not sufficient for some niche cases, as processes can belong to more than one unit these days: 1. sd_notify() with MAINPID= can be used to attach a process from a different cgroup to multiple units. 2. Similar, the PIDFile= setting in unit files can be used for similar setups, 3. By creating a scope unit a main process of a service may join a different unit, too. 4. On cgroupsv1 we frequently end up watching all processes remaining in a scope, and if a process opens lots of scopes one after the other it might thus end up being watch by many of them. This patch hence removes the 2-unit-per-PID limit. It also makes a couple of other changes, some of them quite relevant: - manager_get_unit_by_pid() (and the bus call wrapping it) when there's ambiguity will prefer returning the Unit the process belongs to based on cgroup membership, and only check the watch-pids hashmap if that fails. This change in logic is probably more in line with what people expect and makes things more stable as each process can belong to exactly one cgroup only. - Every SIGCHLD event is now dispatched to all units interested in its PID. Previously, there was some magic conditionalization: the SIGCHLD would only be dispatched to the unit if it was only interested in a single PID only, or the PID belonged to the control or main PID or we didn't dispatch a signle SIGCHLD to the unit in the current event loop iteration yet. These rules were quite arbitrary and also redundant as the the per-unit handlers would filter the PIDs anyway a second time. With this change we'll hence relax the rules: all we do now is dispatch every SIGCHLD event exactly once to each unit interested in it, and it's up to the unit to then use or ignore this. We use a generation counter in the unit to ensure that we only invoke the unit handler once for each event, protecting us from confusion if a unit is both associated with a specific PID through cgroup membership and through the "watch_pids" logic. It also protects us from being confused if the "watch_pids" hashmap is altered while we are dispatching to it (which is a very likely case). - sd_notify() message dispatching has been reworked to be very similar to SIGCHLD handling now. A generation counter is used for dispatching as well. This also adds a new test that validates that "watch_pid" registration and unregstration works correctly.
author: Lennart Poettering <lennart@poettering.net> 2018-01-12 13:41:05 +0100
committer: Lennart Poettering <lennart@poettering.net> 2018-01-23 21:29:31 +0100
commit: 62a769136df4065ce0711625e1e78ec996447862 (patch)
tree: 360b89fcda490f4936cf9ed01c391d8b2223d1d3 /src/core/manager.h
parent: 11aef522c16d739653228ef3d5925b6fb25b9d8b (diff)
download: systemd-62a769136df4065ce0711625e1e78ec996447862.tar.gz
1 files changed, 14 insertions, 9 deletions
diff --git a/src/core/manager.h b/src/core/manager.h
index 3af780f866..90d5258b53 100644
--- a/src/core/manager.h
+++ b/src/core/manager.h
@@ -145,14 +145,14 @@ struct Manager {
 
         sd_event *event;
 
-        /* We use two hash tables here, since the same PID might be
-         * watched by two different units: once the unit that forked
-         * it off, and possibly a different unit to which it was
-         * joined as cgroup member. Since we know that it is either
-         * one or two units for each PID we just use to hashmaps
-         * here. */
-        Hashmap *watch_pids1;  /* pid => Unit object n:1 */
-        Hashmap *watch_pids2;  /* pid => Unit object n:1 */
+        /* This maps PIDs we care about to units that are interested in. We allow multiple units to he interested in
+         * the same PID and multiple PIDs to be relevant to the same unit. Since in most cases only a single unit will
+         * be interested in the same PID we use a somewhat special encoding here: the first unit interested in a PID is
+         * stored directly in the hashmap, keyed by the PID unmodified. If there are other units interested too they'll
+         * be stored in a NULL-terminated array, and keyed by the negative PID. This is safe as pid_t is signed and
+         * negative PIDs are not used for regular processes but process groups, which we don't care about in this
+         * context, but this allows us to use the negative range for our own purposes. */
+        Hashmap *watch_pids;  /* pid => unit as well as -pid => array of units */
 
         /* A set contains all units which cgroup should be refreshed after startup */
         Set *startup_units;
@@ -350,8 +350,13 @@ struct Manager {
 
         int first_boot; /* tri-state */
 
-        /* prefixes of e.g. RuntimeDirectory= */
+        /* Prefixes of e.g. RuntimeDirectory= */
         char *prefix[_EXEC_DIRECTORY_TYPE_MAX];
+
+        /* Used in the SIGCHLD and sd_notify() message invocation logic to avoid that we dispatch the same event
+         * multiple times on the same unit. */
+        unsigned sigchldgen;
+        unsigned notifygen;
 };
 
 #define MANAGER_IS_SYSTEM(m) ((m)->unit_file_scope == UNIT_FILE_SYSTEM)
author	Lennart Poettering <lennart@poettering.net>	2018-01-12 13:41:05 +0100
committer	Lennart Poettering <lennart@poettering.net>	2018-01-23 21:29:31 +0100
commit	62a769136df4065ce0711625e1e78ec996447862 (patch)
tree	360b89fcda490f4936cf9ed01c391d8b2223d1d3 /src/core/manager.h
parent	11aef522c16d739653228ef3d5925b6fb25b9d8b (diff)
download	systemd-62a769136df4065ce0711625e1e78ec996447862.tar.gz