-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Each vCPU loop has an associated Propolis Task
that is used to tell the vCPU execution loop to pause or exit in response to some kind of event. For example, a request to reset the VM tells all vCPU threads' tasks to pause. To handle the case where a vCPU is in guest context when a pause event arrives, each vCPU TaskHdl
is given an associated barrier function (in VcpuTasks::new
) that will try to evict the vCPU from the guest after the task is marked as paused.
The barrier function currently tries to read a guest register. This will indeed cause an exit if the vCPU is in the guest. But if the vCPU is about to enter the guest, this poke will be missed, and the vCPU task won't get a chance to pause until the next time it exits. That may never happen; for example, in #559, a system handled two triple-fault reset events and then wedged after logging the following:
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
vcpu = 0
16:47:00.464Z INFO propolis-server (vm_state_worker): State worker handled event
outcome = Continue
16:47:00.464Z INFO propolis-server (vm_state_worker): State worker handling event
event = Guest(VcpuSuspendTripleFault(3))
16:47:00.464Z INFO propolis-server (vm_state_worker): Resetting due to triple fault on vCPU 3
16:47:00.464Z INFO propolis-server (vm_state_worker): Resetting instance
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
vcpu = 2
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
vcpu = 3
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
vcpu = 1
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU paused
vcpu = 0
I suspect what has happened here is that
- all four vCPUs were resumed after handling a triple fault reset
- vCPU 0 entered the guest but the other vCPU threads did not
- the state driver dequeued another triple fault reset event (see Hang on Helios guest restart. #559 (comment))
- the state driver asked to pause all four vCPU tasks
- the vCPU barrier function successfully kicked vCPU 0 out of the guest and caused it to pause...
- ...but the barrier function had no effect on vCPUs 1-3, because they haven't entered the guest yet
- vCPUs 1-3 enter the guest and never exit, because they were just reset and are waiting for init interrupts from vCPU 0, which is paused
To fix this, we (probably) need a more reliable way to tell the kernel VMM that the next attempt to enter the guest should exit immediately, so that the state driver can be sure that when it asks to pause vCPUs, they will evaluate their task states at least once more, irrespective of what they're doing when the pause request arrives.
Note that this is orthogonal to two other related problems stemming from #559:
- bhyve should provide enough information in VM_SUSPEND exits to allow Propolis to queue only a single event on triple-fault (see the above-linked comment)
- Propolis should do a better job of deduplicating/discarding events that came from old "generations" of a VM (i.e. when a system is reset, vCPU events from before the reset should be discarded)