Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

acbits · 2025-05-12T03:06:10Z

Name and Version

llama-cli --version
version: 5336 (053367d)
built with gcc-12.4 (GCC) 12.4.0 for x86_64-redhat-linux

Operating systems

Linux

GGML backends

Vulkan

Hardware

AMD RX 7600

Models

Phi-4-mini-reasoning-Q8_0.gguf

Problem description & steps to reproduce

server crashes after a few minutes of inference.

./llama-server -m /models/Phi-4-mini-reasoning-Q8_0.gguf -t 8 --batch-size 2048 --ubatch-size 1024 -fa -ctk q8_0 -ctv q8_0 --gpu-layers 99 -c 32768 --temp 0.8 --top-p 0.95 --min-p 0 --jinja

First Bad Commit

No response

Relevant log output

/media/build/llama/ggml/src/ggml-backend.cpp:748: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
[New LWP 11936]
[New LWP 11937]
[New LWP 11938]
[New LWP 11939]
[New LWP 11940]
[New LWP 11941]
[New LWP 11942]
[New LWP 11943]
[New LWP 11944]
[New LWP 11945]
[New LWP 11946]
[New LWP 11947]
[New LWP 20707]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f51be9db5a6 in waitpid () from /lib64/libpthread.so.0
#0  0x00007f51be9db5a6 in waitpid () from /lib64/libpthread.so.0
#1  0x000000000070a5e8 in ggml_abort ()
#2  0x000000000071dfff in ggml_backend_sched_backend_id_from_cur(ggml_backend_sched*, ggml_tensor*) ()
#3  0x000000000071ee1a in ggml_backend_sched_split_graph(ggml_backend_sched*, ggml_cgraph*) [clone .part.0] ()
#4  0x00000000007227d1 in ggml_backend_sched_alloc_graph ()
#5  0x00000000005089ce in llama_kv_cache_unified::update(llama_context&) ()
#6  0x00000000004e146f in llama_context::kv_self_update() ()
#7  0x00000000004e487e in llama_context::decode(llama_batch&) ()
#8  0x00000000004e62ea in llama_decode ()
#9  0x00000000003676da in server_context::update_slots() ()
#10 0x00000000003323dc in server_queue::start_loop() ()
#11 0x00000000003a1420 in main ()
[Inferior 1 (process 11925) detached]

The text was updated successfully, but these errors were encountered:

jeffbolznv · 2025-05-13T23:48:06Z

This seems like a possible out of memory or 4GB limit issue.

acbits · 2025-05-14T15:35:18Z

The card has 8GB VRAM and I am able to run other 8GB models which fully occupy the VRAM, so not sure it is due to OOM.

acbits added the bug-unconfirmed label May 12, 2025

JohannesGaessler changed the title ~~Phi-4-mini reasoning CRASH!!!~~ Phi-4-mini reasoning CRASH!!! (Vulkan) May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

acbits commented May 12, 2025 •

edited

Loading

jeffbolznv commented May 13, 2025

acbits commented May 14, 2025

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

Comments

acbits commented May 12, 2025 • edited Loading

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

jeffbolznv commented May 13, 2025

acbits commented May 14, 2025

acbits commented May 12, 2025 •

edited

Loading