Open
Description
Name and Version
version: 5835 (6491d6e)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
Vulkan
Hardware
Ryzen 7 9700X + Radeon 7800XT
Models
Mistral-Small IQ4_XS
Problem description & steps to reproduce
Mistral-Small doesn't work on vulkan, the first tokens after a prompt get generated correctly but after around 200 tokens the generation speed drops substantially and the output becomes incoherent.
First Bad Commit
This issue has been first introduced in commit 8875523.
Relevant log output
vk_amdvlk ./llama-server -t 8 -fa -ctk q8_0 -ctv q8_0 -c 32768 -m ~/Applications/chat/gguf/Mistral-Small-3.2-24B-Instruct-2506-IQ4_XS.gguf -ngl 100