-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Misc. bug: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed #13359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tried several models (old ones and "thinking") - the same results. |
This model doesn't work (bug): And this model works! (no bug): |
@ggerganov I have tracked this to |
I think that this should be enough: diff --git a/src/llama-kv-cache.cpp b/src/llama-kv-cache.cpp
index 3dcad65bb..8e160a193 100644
--- a/src/llama-kv-cache.cpp
+++ b/src/llama-kv-cache.cpp
@@ -441,6 +441,7 @@ void llama_kv_cache_unified::defrag_sched(float thold) {
void llama_kv_cache_unified::set_full() {
n = size;
+ head = 0;
}
llama_sbatch llama_kv_cache_unified::sbatch_init(
Haven't tested, just a guess atm. Will look into this further. |
For more context, it happens after a shift + defrag, a shift alone is not enough, so it requires at least 2048 context to trigger this, and a batch size of at least half the context size. The proposed change fixes it from what I can tell. |
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080 Laptop GPU, compute capability 8.6, VMM: yes
load_backend: loaded CUDA backend from c:\D_Drive\Copies\ML_MODELS\llama-b5293-bin-win-cuda-cu12.4-x64\ggml-cuda.dll
load_backend: loaded RPC backend from c:\D_Drive\Copies\ML_MODELS\llama-b5293-bin-win-cuda-cu12.4-x64\ggml-rpc.dll
load_backend: loaded CPU backend from c:\D_Drive\Copies\ML_MODELS\llama-b5293-bin-win-cuda-cu12.4-x64\ggml-cpu-haswell.dll
version: 5293 (1e333d5)
built with MSVC 19.29.30159.0 for Windows AMD64
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
It seems that when the total number of tokens (parsing + generation) exceeds or approaches the limit
n_ctx_slot
- llama-server instead of interrupting the response, it crashes with the following error:ggml\src\ggml.c:1554: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src)) failed
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: