| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Replace local implementations of mkdir_p in favour of using the
more robust implementation now added to libubox.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Replace unreachable error handling code in function setns_open with
a more appropriate assertion.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Pass loglevel to preloaded seccomp handler, output generated program
along with unresolved syscalls if debugging output is requested.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Break overly long line, add some comments.
No functional changes.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
In commit 3019f50 ("jail: leak less memory") memory handling in cgroups
related code was refactored. That allows to call cgroups_free()
unconditionally and remove the child-branch of in free_opts().
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
Restructure and add code to process rules based on syscall arguments as
defined in OCI run-tine spec. Generated BPF code became more efficient
as now only one BPF instruction for each syscall is required.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
So we are safe for the future.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Call to enter an existing cgroups namespace was missing. Add it.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Because that won't work. Use relatime instead.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
'-j' is wrong, it should be '-i' (for _i_mmediately).
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
| |
Resolve the userid in parent namespace mapped to the root user of the
new user namespace. Before clone(), seteuid() to that user in the parent
namespace.
Use SECBIT_NO_SETUID_FIXUP so the parent process can later on switch
back using seteuid(0).
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Make valgrind more happy
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Pre-calculate allocation length more simple and make sure maps are
properly generated.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
Move check for named jail up to main() function, and also add that
condition in case an OCI container is loaded as that would segfault
in case no name was given.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Always free everything before exiting, clean up dynamic structures,
add missing free() calls in various places, ...
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
| |
ujail's seccomp ld-preload support broke recently with
Error relocating /lib/libpreload-seccomp.so: debug: symbol not found
Fix that by adding a debug variable to seccomp.c.
Fixes: be6da62 ("seccomp: silence 'unknown syscall' warnings")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
"I'm sure you said cgroup2"
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Output them as debugging messages instead.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Drop the old OpenWrt-specific seccomp rule parser in favour of reusing
the OCI compliant variant.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Specifying the architecture used for system calls is optional in OCI
spec. Make it optional in the parser.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allocate enough stack space for capget()/capset() which requires
2*sizeof(struct __user_cap_data_struct), each containing 32-bit fields,
where the 2nd struct contains the bits for high (>32) capabilities.
Failing to do that not only leads to those high capabilities being
inaccessible but also overwrote the stack resulting in ujail hanging
infinitely instead of returning from applyOCIcapabilities().
Also adapt debugging output to 64-bit format.
Apart from that, don't set SECBIT_NO_SETUID_FIXUP when not actually
modifying capabilities explicitely, as that would result in ALL
capabilities retained in the subsequent setuid() call instead of
having them all dropped.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Add support for propagation mount options (private, slave, shared,
unbindable, rprivate, rslave, rshared, runbindable).
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
ujail tried to parse boolean values in config.json even if they were
not present which lead to segfaults.
Check if booleans are actually present before trying to parse them.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Work-around gcc bug which leads to segfault parsing ELF on MIPS64.
The codepath added in this commit gets triggered when parsing
/lib/ld-musl-mips64-sf.so.1 (a symlink to /lib/libc.so) on MIPS64
(built with gcc-8.4.0 and musl 1.1.24) in qemu-system-mips64 on the
malta/be64 target.
Include work-around outputting an error message, but preventing
segfault when building for MIPS64.
Tested-by: Roman Kuzmitskii <damex.pp@icloud.com>
[tested on edgerouter 4 and edgerouter lite]
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
| |
Mount /etc/resolv.conf, /etc/passwd, /etc/group and /etc/nsswitch.conf
read-only in ujail slim-containers.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
| |
Previsously capabilities could be defined for slim-containers using
our own JSON format, only allowing to modify capabilities in the
bouding set. As apparently that was never used by even a single
package, drop that old parser and logic in favour of reusing the now
existing OCI capability handling functions.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
| |
The previous commit
3121467 ("early: run ubusd non-root as user ubus, group ubus")
changed the path of the ubus socket from /var/run/ubus.sock to
/var/run/ubus/ubus.sock. Adapt jail to also mount-bind that new
path for jails which include ubus access (eg. dnsmasq).
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
OCI 'swap' value encodes memory+swap, make the best out of that.
Ignore 'kernel' and 'kernelTCP' values rather than returning with
error as kernel memory is accounted in the existing limits in cgroup2.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
| |
Prevent specifying directories by banning the use of '/' characters
and disallow some internal cgroup.* files as suggested in [1].
[1]: https://github.com/opencontainers/runtime-spec/pull/1040
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Fixes segfault on shutdown with slim containers.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Start pure cgroup2 implementation with emulation of (some) cgroup1
properties.
Initially support converting cpu, memory, blockIO, pids to unified in
addition to directly specifying unified attributes as suggested in
https://github.com/opencontainers/runtime-spec/pull/1040
Support for converting devices and network into BPF programs is
planned.
Now that containers have their representation in the unified cgroup
hierarchy, make sure using cgroup namespaces also produces meaningful
results.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
Just like pidns, timens is also only applied to children forked after
the setns() call, so use the same semantics here as well when joining
an existing time namespace.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using the the current container signal method to send a signal to the
jailed process works fine, as signals are being forwarded by the
ujail parent process. However, in case of KILL (==9) signal, both,
parent and jailed process are killed immediately which results in the
'poststop' OCI hook being skipped.
Add new 'kill' method to ujail's container object to allow sending
signals to the jailed process directly instead of having to send
signals to the parent.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
| |
* register ubus object for container to query state
* wait on 'created' state until 'start' command is issued via ubus
* have a way to bypass waiting on 'created' state
* support OCI annotations pass-through
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
| |
Make sure hook execution is completed before continueing with any
further actions. This involves a major refactoring ujail to use a
single uloop mainloop for each process to avoid congruency issues.
Also fix other remaining problems in code for OCI hooks, such as making
sure memory allocated to store hook information is zerod.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
| |
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
Allow OCI containers to specify paths to existing namespaces.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
| |
The proper format for size_t is %zu .
Signed-off-by: Rosen Penev <rosenp@gmail.com>
|
|
|
|
|
|
|
|
|
| |
Hack to make /proc/sys/net read-write while the rest of /proc/sys is
read-only which cannot be expressed with OCI spec, but happends to be
very useful. Only apply it if '/proc/sys' is not already listed as
mount, maskedPath or readonlyPath.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add default mounts of /dev, /dev/pts, /dev/shm, /proc and /sys to
the restructured mounts AVL list instead of calling mount directly.
While for slim containers this change shouldn't make any difference,
it allows OCI containers to override options of those default
filesystems.
The previous hack keeping /proc/sys/net mounted read-write if inside
a new network namespace while all the rest of /proc/sys is read-only
cannot easily be translated and is removed for now.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|
|
|
|
|
|
|
|
| |
OCI supplied filesystems-specific mount options have not been stored
in the add_mount() function. strdup() them there and free the original
string in the OCI function.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
|