Skip to content

VcpuTasks::pause_all is not guaranteed to pause all vCPUs successfully #561

@gjcolombo

Description

@gjcolombo

Each vCPU loop has an associated Propolis Task that is used to tell the vCPU execution loop to pause or exit in response to some kind of event. For example, a request to reset the VM tells all vCPU threads' tasks to pause. To handle the case where a vCPU is in guest context when a pause event arrives, each vCPU TaskHdl is given an associated barrier function (in VcpuTasks::new) that will try to evict the vCPU from the guest after the task is marked as paused.

The barrier function currently tries to read a guest register. This will indeed cause an exit if the vCPU is in the guest. But if the vCPU is about to enter the guest, this poke will be missed, and the vCPU task won't get a chance to pause until the next time it exits. That may never happen; for example, in #559, a system handled two triple-fault reset events and then wedged after logging the following:

16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
    vcpu = 0
16:47:00.464Z INFO propolis-server (vm_state_worker): State worker handled event
    outcome = Continue
16:47:00.464Z INFO propolis-server (vm_state_worker): State worker handling event
    event = Guest(VcpuSuspendTripleFault(3))
16:47:00.464Z INFO propolis-server (vm_state_worker): Resetting due to triple fault on vCPU 3
16:47:00.464Z INFO propolis-server (vm_state_worker): Resetting instance
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
    vcpu = 2
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
    vcpu = 3
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU released from hold
    vcpu = 1
16:47:00.464Z INFO propolis-server (vcpu_tasks): vCPU paused
    vcpu = 0

I suspect what has happened here is that

  • all four vCPUs were resumed after handling a triple fault reset
  • vCPU 0 entered the guest but the other vCPU threads did not
  • the state driver dequeued another triple fault reset event (see Hang on Helios guest restart. #559 (comment))
  • the state driver asked to pause all four vCPU tasks
  • the vCPU barrier function successfully kicked vCPU 0 out of the guest and caused it to pause...
  • ...but the barrier function had no effect on vCPUs 1-3, because they haven't entered the guest yet
  • vCPUs 1-3 enter the guest and never exit, because they were just reset and are waiting for init interrupts from vCPU 0, which is paused

To fix this, we (probably) need a more reliable way to tell the kernel VMM that the next attempt to enter the guest should exit immediately, so that the state driver can be sure that when it asks to pause vCPUs, they will evaluate their task states at least once more, irrespective of what they're doing when the pause request arrives.

Note that this is orthogonal to two other related problems stemming from #559:

  • bhyve should provide enough information in VM_SUSPEND exits to allow Propolis to queue only a single event on triple-fault (see the above-linked comment)
  • Propolis should do a better job of deduplicating/discarding events that came from old "generations" of a VM (i.e. when a system is reset, vCPU events from before the reset should be discarded)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions