summaryrefslogtreecommitdiff
path: root/jail
Commit message (Collapse)AuthorAgeFilesLines
...
* treewide: replace local mkdir_p implementationsDaniel Golle2020-12-124-28/+3
| | | | | | | Replace local implementations of mkdir_p in favour of using the more robust implementation now added to libubox. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: remove unreachable codeDaniel Golle2020-12-091-2/+1
| | | | | | | Replace unreachable error handling code in function setns_open with a more appropriate assertion. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: improve seccomp log outputDaniel Golle2020-12-015-13/+25
| | | | | | | Pass loglevel to preloaded seccomp handler, output generated program along with unresolved syscalls if debugging output is requested. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: seccomp: improve code readabilityDaniel Golle2020-11-301-10/+31
| | | | | | | Break overly long line, add some comments. No functional changes. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: always call cgroups_free()Daniel Golle2020-11-301-3/+1
| | | | | | | | In commit 3019f50 ("jail: leak less memory") memory handling in cgroups related code was refactored. That allows to call cgroups_free() unconditionally and remove the child-branch of in free_opts(). Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: improve seccomp BPF generatorDaniel Golle2020-11-302-22/+160
| | | | | | | | Restructure and add code to process rules based on syscall arguments as defined in OCI run-tine spec. Generated BPF code became more efficient as now only one BPF instruction for each syscall is required. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: properly initialize timens_fdDaniel Golle2020-11-271-0/+2
| | | | | | So we are safe for the future. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: enter existing cgroups namespace if givenDaniel Golle2020-11-271-0/+2
| | | | | | Call to enter an existing cgroups namespace was missing. Add it. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: don't attempt to mount /sys with noatimeDaniel Golle2020-11-271-1/+1
| | | | | | Because that won't work. Use relatime instead. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix typo in usage outputDaniel Golle2020-11-271-1/+1
| | | | | | '-j' is wrong, it should be '-i' (for _i_mmediately). Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: seteuid before clone(CLONE_NEWUSER)Daniel Golle2020-11-271-3/+33
| | | | | | | | | | Resolve the userid in parent namespace mapped to the root user of the new user namespace. Before clone(), seteuid() to that user in the parent namespace. Use SECBIT_NO_SETUID_FIXUP so the parent process can later on switch back using seteuid(0). Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: don't fail if can't mount-bind /etc/resolv.confDaniel Golle2020-11-271-2/+2
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: don't use NULL arguments for mount syscallDaniel Golle2020-11-272-9/+9
| | | | | | Make valgrind more happy Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: relax /etc/resolv.conf creationDaniel Golle2020-11-271-5/+5
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix and simplify userns uid/gid maps from OCIDaniel Golle2020-11-271-36/+19
| | | | | | | Pre-calculate allocation length more simple and make sure maps are properly generated. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix segfault on missing name and refactorDaniel Golle2020-11-271-5/+16
| | | | | | | | Move check for named jail up to main() function, and also add that condition in case an OCI container is loaded as that would segfault in case no name was given. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: leak less memoryDaniel Golle2020-11-277-111/+185
| | | | | | | Always free everything before exiting, clean up dynamic structures, add missing free() calls in various places, ... Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add 'debug' extern variable to preload_seccompDaniel Golle2020-11-221-0/+2
| | | | | | | | | ujail's seccomp ld-preload support broke recently with Error relocating /lib/libpreload-seccomp.so: debug: symbol not found Fix that by adding a debug variable to seccomp.c. Fixes: be6da62 ("seccomp: silence 'unknown syscall' warnings") Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: cgroup hack: rewrite cgroup -> cgroup2Daniel Golle2020-11-211-1/+2
| | | | | | "I'm sure you said cgroup2" Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* seccomp: silence 'unknown syscall' warningsDaniel Golle2020-11-211-1/+1
| | | | | | Output them as debugging messages instead. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* seccomp: switch to new OCI compliant parserDaniel Golle2020-11-151-86/+6
| | | | | | | Drop the old OpenWrt-specific seccomp rule parser in favour of reusing the OCI compliant variant. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* seccomp: specifying architectures is optionalDaniel Golle2020-11-151-10/+17
| | | | | | | Specifying the architecture used for system calls is optional in OCI spec. Make it optional in the parser. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix capabilitiesDaniel Golle2020-11-072-17/+38
| | | | | | | | | | | | | | | | Allocate enough stack space for capget()/capset() which requires 2*sizeof(struct __user_cap_data_struct), each containing 32-bit fields, where the 2nd struct contains the bits for high (>32) capabilities. Failing to do that not only leads to those high capabilities being inaccessible but also overwrote the stack resulting in ujail hanging infinitely instead of returning from applyOCIcapabilities(). Also adapt debugging output to 64-bit format. Apart from that, don't set SECBIT_NO_SETUID_FIXUP when not actually modifying capabilities explicitely, as that would result in ALL capabilities retained in the subsequent setuid() call instead of having them all dropped. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: don't fail if maskedPath cannot be foundDaniel Golle2020-10-281-1/+1
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add support for absolute root path in OCI specDaniel Golle2020-10-281-8/+15
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: relax seccomp unknown syscall handlingDaniel Golle2020-10-281-1/+2
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: handle mount propagation flagsDaniel Golle2020-10-283-39/+73
| | | | | | | Add support for propagation mount options (private, slave, shared, unbindable, rprivate, rslave, rshared, runbindable). Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add option for pidfileDaniel Golle2020-10-281-1/+34
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: guard boolean blobmsg attributesDaniel Golle2020-10-281-3/+7
| | | | | | | | ujail tried to parse boolean values in config.json even if they were not present which lead to segfaults. Check if booleans are actually present before trying to parse them. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* ujail: elf: work around GCC bug on MIPS64Daniel Golle2020-10-231-0/+12
| | | | | | | | | | | | | | Work-around gcc bug which leads to segfault parsing ELF on MIPS64. The codepath added in this commit gets triggered when parsing /lib/ld-musl-mips64-sf.so.1 (a symlink to /lib/libc.so) on MIPS64 (built with gcc-8.4.0 and musl 1.1.24) in qemu-system-mips64 on the malta/be64 target. Include work-around outputting an error message, but preventing segfault when building for MIPS64. Tested-by: Roman Kuzmitskii <damex.pp@icloud.com> [tested on edgerouter 4 and edgerouter lite] Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: mount more stuff read-onlyDaniel Golle2020-10-221-4/+4
| | | | | | | Mount /etc/resolv.conf, /etc/passwd, /etc/group and /etc/nsswitch.conf read-only in ujail slim-containers. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: capabilities: apply in two phasesDaniel Golle2020-10-213-14/+27
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: nuke old capabilities code in favour of reusing OCI codeDaniel Golle2020-10-193-79/+19
| | | | | | | | | | Previsously capabilities could be defined for slim-containers using our own JSON format, only allowing to modify capabilities in the bouding set. As apparently that was never used by even a single package, drop that old parser and logic in favour of reusing the now existing OCI capability handling functions. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: adapt to new ubus socket pathDaniel Golle2020-10-191-1/+1
| | | | | | | | | | The previous commit 3121467 ("early: run ubusd non-root as user ubus, group ubus") changed the path of the ubus socket from /var/run/ubus.sock to /var/run/ubus/ubus.sock. Adapt jail to also mount-bind that new path for jails which include ubus access (eg. dnsmasq). Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* cgroups: memory controller fixesDaniel Golle2020-08-131-7/+18
| | | | | | | | OCI 'swap' value encodes memory+swap, make the best out of that. Ignore 'kernel' and 'kernelTCP' values rather than returning with error as kernel memory is accounted in the existing limits in cgroup2. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* cgroups: restrict allowed keys in 'unified' sectionDaniel Golle2020-08-131-0/+8
| | | | | | | | | Prevent specifying directories by banning the use of '/' characters and disallow some internal cgroup.* files as suggested in [1]. [1]: https://github.com/opencontainers/runtime-spec/pull/1040 Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix freeing cgroups avlDaniel Golle2020-08-062-2/+5
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: only free cgroups if they were allocatedDaniel Golle2020-08-061-1/+2
| | | | | | Fixes segfault on shutdown with slim containers. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: parse OCI cgroups resourcesDaniel Golle2020-08-063-30/+899
| | | | | | | | | | | | | | | | | Start pure cgroup2 implementation with emulation of (some) cgroup1 properties. Initially support converting cpu, memory, blockIO, pids to unified in addition to directly specifying unified attributes as suggested in https://github.com/opencontainers/runtime-spec/pull/1040 Support for converting devices and network into BPF programs is planned. Now that containers have their representation in the unified cgroup hierarchy, make sure using cgroup namespaces also produces meaningful results. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: make use of BLOBMSG_CAST_INT64 for OCI rlimitsDaniel Golle2020-08-061-53/+39
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: use pidns semantics also for timensDaniel Golle2020-08-061-20/+20
| | | | | | | | Just like pidns, timens is also only applied to children forked after the setns() call, so use the same semantics here as well when joining an existing time namespace. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add 'kill' method to container.%s objectDaniel Golle2020-07-291-0/+39
| | | | | | | | | | | | | Using the the current container signal method to send a signal to the jailed process works fine, as signals are being forwarded by the ujail parent process. However, in case of KILL (==9) signal, both, parent and jailed process are killed immediately which results in the 'poststop' OCI hook being skipped. Add new 'kill' method to ujail's container object to allow sending signals to the jailed process directly instead of having to send signals to the parent. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add some remaining OCI featuresDaniel Golle2020-07-281-60/+202
| | | | | | | | | * register ubus object for container to query state * wait on 'created' state until 'start' command is issued via ubus * have a way to bypass waiting on 'created' state * support OCI annotations pass-through Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: serialize hook executionDaniel Golle2020-07-261-167/+239
| | | | | | | | | | Make sure hook execution is completed before continueing with any further actions. This involves a major refactoring ujail to use a single uloop mainloop for each process to avoid congruency issues. Also fix other remaining problems in code for OCI hooks, such as making sure memory allocated to store hook information is zerod. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix build on glibc and uclibcDaniel Golle2020-07-251-0/+11
| | | | Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: add support for referencing existing namespacesDaniel Golle2020-07-211-15/+185
| | | | | | Allow OCI containers to specify paths to existing namespaces. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: fix wrong format for 32-bitRosen Penev2020-07-201-1/+1
| | | | | | The proper format for size_t is %zu . Signed-off-by: Rosen Penev <rosenp@gmail.com>
* jail: re-implement /proc/sys/net read-write in netns hackDaniel Golle2020-07-203-7/+62
| | | | | | | | | Hack to make /proc/sys/net read-write while the rest of /proc/sys is read-only which cannot be expressed with OCI spec, but happends to be very useful. Only apply it if '/proc/sys' is not already listed as mount, maskedPath or readonlyPath. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: refactor default mounts into new structureDaniel Golle2020-07-201-36/+16
| | | | | | | | | | | | | Add default mounts of /dev, /dev/pts, /dev/shm, /proc and /sys to the restructured mounts AVL list instead of calling mount directly. While for slim containers this change shouldn't make any difference, it allows OCI containers to override options of those default filesystems. The previous hack keeping /proc/sys/net mounted read-write if inside a new network namespace while all the rest of /proc/sys is read-only cannot easily be translated and is removed for now. Signed-off-by: Daniel Golle <daniel@makrotopia.org>
* jail: actually apply filesystem-specific mount optionsDaniel Golle2020-07-201-1/+10
| | | | | | | | OCI supplied filesystems-specific mount options have not been stored in the add_mount() function. strdup() them there and free the original string in the OCI function. Signed-off-by: Daniel Golle <daniel@makrotopia.org>