You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use openweb-ui for short inputs, the model works fine, but when using long inputs or the conversation gets too long llama.cpp throws an assertion error:
Name and Version
[docker@104ba42db8f2 ~]$ llama-server --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
version: 5234 (3e168be)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
Vulkan
Hardware
7900XTX under archlinux with the radv vulkan driver
Models
Qwen3-30B-A3B
Problem description & steps to reproduce
I am using the following command:
llama-server --port 9001 --metrics --slots -m /models/Qwen3-30B-A3B-UD-Q4_K_XL.gguf -ngl 999 --ctx-size 32768 --no-context-shift
When I use openweb-ui for short inputs, the model works fine, but when using long inputs or the conversation gets too long llama.cpp throws an assertion error:
First Bad Commit
N/A
Relevant log output
The text was updated successfully, but these errors were encountered: