diff options
author | Colin Walters <walters@verbum.org> | 2022-04-22 18:46:28 -0400 |
---|---|---|
committer | Colin Walters <walters@verbum.org> | 2022-04-26 13:02:46 -0400 |
commit | d3d3e4ea13944911a243690523d941ed0b4b0041 (patch) | |
tree | ebcdef01adeac7a9711cb8c04b1f2ee8d5e114e3 /tests/kolainst | |
parent | 98587a72db9b52eee63b4bfa9c47a77d2e327501 (diff) | |
download | ostree-d3d3e4ea13944911a243690523d941ed0b4b0041.tar.gz |
Add an `ostree-boot-complete.service` to propagate staging failures
Quite a while ago we added staged deployments, which solved
a bunch of issues around the `/etc` merge. However...a persistent
problem since then is that any failures in that process that
happened in the *previous* boot are not very visible.
We ship custom code in `rpm-ostree status` to query the previous
journal. But that has a few problems - one is that on systems
that have been up a while, that failure message may even get
rotated out. And second, some systems may not even have a persistent
journal at all.
A general thing we do in e.g. Fedora CoreOS testing is to check
for systemd unit failures. We do that both in our automated tests,
and we even ship code that displays them on ssh logins. And beyond
that obviously a lot of other projects do the same; it's easy via
`systemctl --failed`.
So to make failures more visible, change our `ostree-finalize-staged.service`
to have an internal wrapper around the process that "catches" any
errors, and copies the error message into a file in `/boot/ostree`.
Then, a new `ostree-boot-complete.service` looks for this file on
startup and re-emits the error message, and fails.
It also deletes the file. The rationale is to avoid *continually*
warning. For example we need to handle the case when an upgrade
process creates a new staged deployment. Now, we could change the
ostree core code to delete the warning file when that happens instead,
but this is trying to be a conservative change.
This should make failures here much more visible as is.
Diffstat (limited to 'tests/kolainst')
-rwxr-xr-x | tests/kolainst/destructive/staged-deploy.sh | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/tests/kolainst/destructive/staged-deploy.sh b/tests/kolainst/destructive/staged-deploy.sh index df40f115..7e1991bb 100755 --- a/tests/kolainst/destructive/staged-deploy.sh +++ b/tests/kolainst/destructive/staged-deploy.sh @@ -146,6 +146,18 @@ EOF # Cleanup refs ostree refs --delete staged-deploy nonstaged-deploy echo "ok cleanup refs" + + # Now finally, try breaking staged updates and verify that ostree-boot-complete fails on the next boot + unshare -m /bin/sh -c 'mount -o remount,rw /boot; chattr +i /boot' + rpm-ostree kargs --append=foo=bar + /tmp/autopkgtest-reboot "3" + ;; + "3") + (systemctl status ostree-boot-complete.service || true) | tee out.txt + assert_file_has_content out.txt 'error: ostree-finalize-staged.service failed on previous boot.*Operation not permitted' + systemctl show -p Result ostree-boot-complete.service > out.txt + assert_file_has_content out.txt='Result=exit-code' + echo "ok boot-complete.service" ;; *) fatal "Unexpected AUTOPKGTEST_REBOOT_MARK=${AUTOPKGTEST_REBOOT_MARK}" ;; esac |