Skip to content

Commit fe75373

Browse files
authored
test: wait until the nspawn process is completely dead (systemd#39576)
Before calling io.systemd.MachineImage.List. The systemd-nspawn process takes a lock in the run() function in nspawn.c and holds it for the entire runtime of that function. If we call `machinectl terminate` the machine gets unregistered _before_ we release the lock, so the original `machinectl status` check would return early, allowing for a race where we call io.systemd.MachineImage.List over Varlink when systemd-nspawn still holds the lock because the process is still running.: ``` [ 41.691826] TEST-13-NSPAWN.sh[1102]: + machinectl terminate long-running [ 41.695009] systemd-nspawn[2171]: Trying to halt container by sending TERM to container PID 1. Send SIGTERM again to trigger immediate termination. [ 41.698235] systemd-machined[1192]: Machine long-running terminated. [ 41.709520] TEST-13-NSPAWN.sh[1102]: + systemctl kill --signal=KILL [email protected] [ 41.709169] systemd-nspawn[2171]: Failed to unregister machine: No machine 'long-running' known [ 41.720869] TEST-13-NSPAWN.sh[2346]: + varlinkctl --more call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{}' [ 41.723359] TEST-13-NSPAWN.sh[2347]: + grep long-running ... [ 41.735453] TEST-13-NSPAWN.sh[2352]: + varlinkctl call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{"name":"long-running", "acquireMetadata": "yes"}' [ 41.736222] TEST-13-NSPAWN.sh[2353]: + grep OSRelease [ 41.739500] TEST-13-NSPAWN.sh[2352]: Method call io.systemd.MachineImage.List() failed: Device or resource busy [ 41.740641] systemd[1]: Received SIGCHLD. [ 41.740670] systemd[1]: Child 2171 (systemd-nspawn) died (code=killed, status=9/KILL) [ 41.740725] systemd[1]: [email protected]: Child 2171 belongs to [email protected]. [ 41.740748] systemd[1]: [email protected]: Main process exited, code=killed, status=9/KILL [ 41.740755] systemd[1]: [email protected]: Will spawn child (service_enter_stop_post): systemd-nspawn [ 41.740872] systemd[1]: [email protected]: About to execute: systemd-nspawn --cleanup --machine=long-running ... ``` Let's mitigate this by waiting until the corresponding [email protected] instance enters the 'inactive' state where the lock should be properly released. Resolves: systemd#39547
2 parents 95d4490 + ed49036 commit fe75373

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

test/units/TEST-13-NSPAWN.machined.sh

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ trap 'touch /terminate; kill 0' RTMIN+3
4848
trap 'touch /poweroff' RTMIN+4
4949
trap 'touch /reboot' INT
5050
trap 'touch /trap' TRAP
51+
trap 'exit 0' TERM
5152
trap 'kill $PID' EXIT
5253
5354
# We need to wait for the sleep process asynchronously in order to allow
@@ -325,6 +326,7 @@ ip address add 192.0.2.1/24 dev hoge
325326
PID=0
326327
327328
trap 'kill 0' RTMIN+3
329+
trap 'exit 0' TERM
328330
trap 'kill $PID' EXIT
329331
330332
# We need to wait for the sleep process asynchronously in order to allow
@@ -439,9 +441,14 @@ varlinkctl call /run/systemd/machine/io.systemd.Machine io.systemd.Machine.OpenR
439441

440442
# Terminating machine, otherwise acquiring image metadata by io.systemd.MachineImage.List may fail in the below.
441443
machinectl terminate long-running
442-
# wait for the container being stopped, otherwise acquiring image metadata by io.systemd.MachineImage.List may fail in the below.
443-
timeout 30 bash -c "while machinectl status long-running &>/dev/null; do sleep .5; done"
444-
systemctl kill --signal=KILL [email protected] || :
444+
# Wait for the container to stop, otherwise acquiring image metadata by io.systemd.MachineImage.List below
445+
# may fail.
446+
#
447+
# We need to wait until the systemd-nspawn process is completely stopped, as the lock is held for almost the
448+
# entire life of the process (see the run() function in nspawn.c). This means that the machine gets
449+
# unregistered _before_ this lock is lifted which makes `machinectl status` return non-zero EC earlier than
450+
# we need.
451+
timeout 30 bash -xec 'until [[ "$(systemctl show -P ActiveState [email protected])" == inactive ]]; do sleep .5; done'
445452

446453
# test io.systemd.MachineImage.List
447454
varlinkctl --more call /run/systemd/machine/io.systemd.MachineImage io.systemd.MachineImage.List '{}' | grep 'long-running'

0 commit comments

Comments
 (0)