Skip to content

Eval bug: Qwen3 30B-A3B throwing assertion error with Vulkan backend #13233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mushoz opened this issue May 1, 2025 · 2 comments
Closed

Eval bug: Qwen3 30B-A3B throwing assertion error with Vulkan backend #13233

Mushoz opened this issue May 1, 2025 · 2 comments

Comments

@Mushoz
Copy link

Mushoz commented May 1, 2025

Name and Version

[docker@104ba42db8f2 ~]$ llama-server --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 5234 (3e168be)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

Vulkan

Hardware

7900XTX under archlinux with the radv vulkan driver

Models

Qwen3-30B-A3B

Problem description & steps to reproduce

I am using the following command:

llama-server --port 9001 --metrics --slots -m /models/Qwen3-30B-A3B-UD-Q4_K_XL.gguf -ngl 999 --ctx-size 32768 --no-context-shift

When I use openweb-ui for short inputs, the model works fine, but when using long inputs or the conversation gets too long llama.cpp throws an assertion error:

slot update_slots: id  0 | task 2322 | prompt done, n_past = 668, n_tokens = 609
/home/docker/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:5059: GGML_ASSERT(nei0 * nei1 <= 3072) failed

First Bad Commit

N/A

Relevant log output

See above
@stduhpf
Copy link
Contributor

stduhpf commented May 1, 2025

See #13164. As a workaround, you can try setting a small batch size (under 384 works for me).

@Mushoz
Copy link
Author

Mushoz commented May 1, 2025

Thanks for the pointer, closing as duplicate

@Mushoz Mushoz closed this as completed May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants