Skip to content

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
acbits opened this issue May 12, 2025 · 2 comments
Open

Phi-4-mini reasoning CRASH!!! (Vulkan) #13464

acbits opened this issue May 12, 2025 · 2 comments

Comments

@acbits
Copy link

acbits commented May 12, 2025

Name and Version

llama-cli --version
version: 5336 (053367d)
built with gcc-12.4 (GCC) 12.4.0 for x86_64-redhat-linux

Operating systems

Linux

GGML backends

Vulkan

Hardware

AMD RX 7600

Models

Phi-4-mini-reasoning-Q8_0.gguf

Problem description & steps to reproduce

server crashes after a few minutes of inference.

./llama-server -m /models/Phi-4-mini-reasoning-Q8_0.gguf -t 8 --batch-size 2048 --ubatch-size 1024 -fa -ctk q8_0 -ctv q8_0 --gpu-layers 99 -c 32768 --temp 0.8 --top-p 0.95 --min-p 0 --jinja

First Bad Commit

No response

Relevant log output

/media/build/llama/ggml/src/ggml-backend.cpp:748: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
[New LWP 11936]
[New LWP 11937]
[New LWP 11938]
[New LWP 11939]
[New LWP 11940]
[New LWP 11941]
[New LWP 11942]
[New LWP 11943]
[New LWP 11944]
[New LWP 11945]
[New LWP 11946]
[New LWP 11947]
[New LWP 20707]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f51be9db5a6 in waitpid () from /lib64/libpthread.so.0
#0  0x00007f51be9db5a6 in waitpid () from /lib64/libpthread.so.0
#1  0x000000000070a5e8 in ggml_abort ()
#2  0x000000000071dfff in ggml_backend_sched_backend_id_from_cur(ggml_backend_sched*, ggml_tensor*) ()
#3  0x000000000071ee1a in ggml_backend_sched_split_graph(ggml_backend_sched*, ggml_cgraph*) [clone .part.0] ()
#4  0x00000000007227d1 in ggml_backend_sched_alloc_graph ()
#5  0x00000000005089ce in llama_kv_cache_unified::update(llama_context&) ()
#6  0x00000000004e146f in llama_context::kv_self_update() ()
#7  0x00000000004e487e in llama_context::decode(llama_batch&) ()
#8  0x00000000004e62ea in llama_decode ()
#9  0x00000000003676da in server_context::update_slots() ()
#10 0x00000000003323dc in server_queue::start_loop() ()
#11 0x00000000003a1420 in main ()
[Inferior 1 (process 11925) detached]
@JohannesGaessler JohannesGaessler changed the title Phi-4-mini reasoning CRASH!!! Phi-4-mini reasoning CRASH!!! (Vulkan) May 12, 2025
@jeffbolznv
Copy link
Collaborator

This seems like a possible out of memory or 4GB limit issue.

@acbits
Copy link
Author

acbits commented May 14, 2025

The card has 8GB VRAM and I am able to run other 8GB models which fully occupy the VRAM, so not sure it is due to OOM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants