diff options
author | Lennart Poettering <lennart@poettering.net> | 2022-03-17 13:46:12 +0100 |
---|---|---|
committer | Lennart Poettering <lennart@poettering.net> | 2022-03-17 19:08:12 +0100 |
commit | 50ae2966d20b0b4a19def060de3b966b7a70b54a (patch) | |
tree | d0c072dfc682f5d2e39439d8b664c76a359eba37 /src/basic/user-util.h | |
parent | 264caae299aa8f42f20460ad3280add657a3747f (diff) | |
download | systemd-50ae2966d20b0b4a19def060de3b966b7a70b54a.tar.gz |
nspawn: make sure host root can write to the uidmapped mounts we prepare for the container payload
When using user namespaces in conjunction with uidmapped mounts, nspawn
so far set up two uidmappings:
1. One that is used for the uidmapped mount and that maps the UID range
0…65535 on the backing fs to some high UID range X…X+65535 on the
uidmapped fs. (Let's call this mapping the "mount mapping")
2. One that is used for the userns namespace the container payload
processes run in, that maps X…X+65535 back to 0…65535. (Let's call
this one the "process mapping").
These mappings hence are pretty much identical, one just moves things up
and one back down. (Reminder: we do all this so that the processes can
run under high UIDs while running off file systems that require no
recursive chown()ing, i.e. we want processes with high UID range but
files with low UID range.)
This creates one problem, i.e. issue #20989: if nspawn (which runs as
host root, i.e. host UID 0) wants to add inodes to the uidmapped mount
it can't do that, since host UID 0 is not defined in the mount mapping
(only the X…X+65536 range is, after all, and X > 0), and processes whose
UID is not mapped in a uidmapped fs cannot create inodes in it since
those would be owned by an unmapped UID, which then triggers
the famous EOVERFLOW error.
Let's fix this, by explicitly including an entry for the host UID 0 in
the mount mapping. Specifically, we'll extend the mount mapping to map
UID 2147483646 (which is INT32_MAX-1, see code for an explanation why I
picked this one) of the backing fs to UID 0 on the uidmapped fs. This
way nspawn can creates inode on the uidmapped as it likes (which will
then actually be owned by UID 2147483646 on the backing fs), and as it
always did. Note that we do *not* create a similar entry in the process
mapping. Thus any files created by nspawn that way (and not chown()ed to
something better) will appear as unmapped (i.e. as overflowuid/"nobody")
in the container payload. And that's good. Of course, the latter is
mostly theoretic, as nspawn should generally chown() the inodes it
creates to UID ranges that actually make sense for the container (and we
generally already do this correctly), but it#s good to know that we are
safe here, given we might accidentally forget to chown() some inodes we
create.
Net effect: the two mappings will not be identical anymore. The mount
mapping has one entry more, and the only reason it exists is so that
nspawn can access the uidmapped fs reasonably independently from any
process mapping.
Fixes: #20989
Diffstat (limited to 'src/basic/user-util.h')
-rw-r--r-- | src/basic/user-util.h | 13 |
1 files changed, 13 insertions, 0 deletions
diff --git a/src/basic/user-util.h b/src/basic/user-util.h index 40979d1080..e1692c4f66 100644 --- a/src/basic/user-util.h +++ b/src/basic/user-util.h @@ -67,6 +67,19 @@ int take_etc_passwd_lock(const char *root); #define UID_NOBODY ((uid_t) 65534U) #define GID_NOBODY ((gid_t) 65534U) +/* If REMOUNT_IDMAP_HOST_ROOT is set for remount_idmap() we'll include a mapping here that maps the host root + * user accessing the idmapped mount to the this user ID on the backing fs. This is the last valid UID in the + * *signed* 32bit range. You might wonder why precisely use this specific UID for this purpose? Well, we + * definitely cannot use the first 0…65536 UIDs for that, since in most cases that's precisely the file range + * we intend to map to some high UID range, and since UID mappings have to be bijective we thus cannot use + * them at all. Furthermore the UID range beyond INT32_MAX (i.e. the range above the signed 32bit range) is + * icky, since many APIs cannot use it (example: setfsuid() returns the old UID as signed integer). Following + * our usual logic of assigning a 16bit UID range to each container, so that the upper 16bit of a 32bit UID + * value indicate kind of a "container ID" and the lower 16bit map directly to the intended user you can read + * this specific UID as the "nobody" user of the container with ID 0x7FFF, which is kinda nice. */ +#define UID_MAPPED_ROOT ((uid_t) (INT32_MAX-1)) +#define GID_MAPPED_ROOT ((gid_t) (INT32_MAX-1)) + #define ETC_PASSWD_LOCK_PATH "/etc/.pwd.lock" /* The following macros add 1 when converting things, since UID 0 is a valid UID, while the pointer |