-
Notifications
You must be signed in to change notification settings - Fork 11.8k
GGML_ASSERT(cur_p->size > 0) failed, or gibberish on DeepSeek V3 0324 (Q2_K_XL), CUDA + CPU #13461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just an update, it seems to happen with any batch size/ubatch size now. |
Okay found the commit where the issue started, on 0208355 If reverting that commit, it works fine. |
Could it be related to |
I am seeing the same issue. I am not able to reproduce it reliably, the same input that caused it to crash will work once I restart the server, but after a few more requests it will crash again. Edit: Also, it seems that r1 is doing a lot more reasoning suddenly (e.g. a dozen lines of "Alternatively, ..." in a row). Could be placebo or could be related to the idea of "gibberish" output. Edit2: Ok, I am seeing straight gibberish in some reasoning/responses too... here are some excerpts from a response where it seemingly lost the plot entirely:
Crash log:
My command-line for llama.cpp is:
This was built against master (9a390c4) as follows:
|
I'm seeing the same issue, nothing to do with any parameters, it's a bug in the code, with deepseek v3 as well, but Q3_K_XL main: server is listening on http://0.0.0.0:8089 - starting the main loop |
I am having better results running without GPUs for now. With So that CUDA FA change mentioned by @Panchovix seems plausible at least. |
I Just tested and so far, #13469 PR seems to have fix it the issue. It is merged now on master. Closing is as @JohannesGaessler commented that should be fixed as well. |
Hi there! I found that I got this issue when trying to use some higher values of -b and -ub with DeepSeekV3, as doing so it increases the PP performance a lot. So got the issues in the title, so tried to set batch sizes to the default values but the issues still happen.
Setup is 5090+4090x2+A6000, Ryzen 7 7800X3D, 192GB RAM, Fedora 42 (built llamacpp with GCC14)
Log is
The text was updated successfully, but these errors were encountered: