examples/runners/wgpu: avoid holding onto to multiple surfaces at the same time. #181

eddyb · 2024-12-17T20:02:19Z

This unbreaks Wayland (I had been using WAYLAND_DISPLAY= cargo run ... for ages instead of investigating it, turns out to have been something very silly).

This is what the bug looked like:

wp_linux_drm_syncobj_manager_v1#63: error 0: surface already exists
thread 'main' panicked at /home/eddy/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-22.1.0/src/device/global.rs:1930:25:
internal error: entered unreachable code: Fallback system failed to choose present mode. This is a bug. Mode: AutoVsync, Options: []
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I kept thinking maybe this is a Wayland protocol mismatch or something, but no, Mesa (the opensource GPU driver stack for Linux) has a bug:

Vulkan/WSI/Wayland: wsi_wl_surface_init can goto fail but return VK_SUCCESS.

We accidentally ended up with this broken scenario, on Wayland:

two wgpu::Surfaces for the same wl_surface (the Wayland window object)
- (technically we had this issue on other platforms but they care less?)
both surfaces had .configure(...) called on them
- AIUI, this is where vkCreateSwapchainKHR gets called
the second vkCreateSwapchainKHR fails to acquire an exclusive resource
- i.e. wp_linux_drm_syncobj_manager_v1#63: error 0: surface already exists
- however, due to that Mesa bug, this error isn't propagated to the caller
- wgpu now thinks it has a valid swapchain for the second surface, too
the second Vulkan surface/swapchain is, however, partially broken
- this makes various operations on that Vulkan surface/swapchain fail
- in particular, wgpu fails to query various surface properties
- somewhat indirectly, it finally panics failing to find a present mode

With RUST_LOG=wgpu_hal=error I was able to see these VK_ERROR_SURFACE_LOST_KHR
(errors which wgpu largely ignores, leading to 0 supported modes/formats):

[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_present_modes: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_formats: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_present_modes: ERROR_SURFACE_LOST_KHR
[2024-12-16T03:25:00Z ERROR wgpu_hal::vulkan::adapter] get_physical_device_surface_formats: ERROR_SURFACE_LOST_KHR

_{(maybe we should run with at least the equivalent of RUST_LOG=error by default? I remember being frustrated that warn!/error! were silent, while working on rustc self-profiling code, which didn't necessarily need nice user-facing diagnostics, but also didn't have a good way to emit them anyway, from the separate measureme library)}

While the Mesa bug being fixed wouldn't prevent the second wgpu::Surface from being created (via instance.create_surface(&window)), it could at least fail with a better error (e.g. VK_ERROR_NATIVE_WINDOW_IN_USE_KHR) when trying to create the swapchain, which would make the situation less confusing.

I've mentioned some of these interactions in this wgpu issue:

Vulkan surface creation fails on wayland gfx-rs/wgpu#6320 (comment)

… same time.

eddyb requested review from LegNeato and fornwall as code owners December 17, 2024 20:02

eddyb enabled auto-merge December 17, 2024 20:02

eddyb mentioned this pull request Dec 17, 2024

Vulkan surface creation fails on wayland gfx-rs/wgpu#6320

Open

examples/runners/wgpu: avoid holding onto to multiple surfaces at the…

edd713e

… same time.

eddyb force-pushed the push-zqulmxwskwvp branch from f84f69b to edd713e Compare December 18, 2024 12:08

LegNeato approved these changes Dec 18, 2024

View reviewed changes

eddyb added this pull request to the merge queue Dec 18, 2024

Merged via the queue into Rust-GPU:main with commit f069c58 Dec 18, 2024
7 checks passed

eddyb deleted the push-zqulmxwskwvp branch December 18, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples/runners/wgpu: avoid holding onto to multiple surfaces at the same time. #181

examples/runners/wgpu: avoid holding onto to multiple surfaces at the same time. #181

Uh oh!

eddyb commented Dec 17, 2024

Uh oh!

Uh oh!

Uh oh!

examples/runners/wgpu: avoid holding onto to multiple surfaces at the same time. #181

examples/runners/wgpu: avoid holding onto to multiple surfaces at the same time. #181

Uh oh!

Conversation

eddyb commented Dec 17, 2024

Uh oh!

Uh oh!

Uh oh!