Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405

bjodah · 2025-05-09T14:05:48Z

Name and Version

$./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 5329 (611aa914)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

llama-cli \
    --log-file /tmp/llamacpp-Qwen3-30B-A3B-Q8_K_XL.log \
    --hf-repo unsloth/Qwen3-30B-A3B-GGUF:Q8_K_XL \
    --override-tensor '([0-9]+).ffn_.*_exps.=CPU' \
    --n-gpu-layers 48 \
    --jinja \
    --cache-type-k q8_0 \
    --ctx-size 32768 \
    --samplers "top_k;dry;min_p;temperature;top_p" \
    --min-p 0.005 \
    --top-p 0.97 \
    --top-k 40 \
    --temp 0.7 \
    --dry-multiplier 0.7 \
    --dry-allowed-length 4 \
    --dry-penalty-last-n 2048 \
    --presence-penalty 0.05 \
    --frequency-penalty 0.005 \
    --repeat-penalty 1.01 \
    --repeat-last-n 16 \
    --verbose \
    --file generic-prompt-for-testing-1906words.txt

Problem description & steps to reproduce

The log file of the output, together with what I hope is all the relevant information can be found in this ephemeral repo I put up for this bug report:
https://github.com/bjodah/bug-reproducer-llamacpp-assert-triggering/tree/main

It might very well that I'm doing something awfully wrong here, but since it's an assert that is triggering, I'm thinking that you might be interested in a bug report?

I first observed this error using llama-serve on my laptop (ubuntu 24.04, geforce 1050 mobile), but everything in this bug report was reproduced on a more modern system (debian, geforce rtx 3090).

First Bad Commit

Qwen 3 support is pretty recent, so I haven't figured out what's the relevant oldest commit for a bisection.

Relevant log output

/... lots of output, see log file in repo linked in issue description .../ 
eval: [ 'G':38 ]
Gn_past = 2620
/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
/home/bjorn/.gdbinit:2: Error in sourced command file:
/home/bjorn/dotfiles/per-file/.gdbinit:22: Error in sourced command file:
Scripting in the "Python" language is not supported in this copy of GDB.
ptrace: Operation not permitted.
No stack.
The program is not being run.

The text was updated successfully, but these errors were encountered:

bjodah · 2025-05-09T14:17:57Z

...I should have added a --seed flag, but the issue is reproducible for me with all seeds I've tried so far.

The issue has to do with --dry-allowed-length 4:

...
Now finish your task according to taskDefinition, only write the poem, add no commentary.
assistant
GGGG/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed

If I adjust this to --dry-allowed-length 9 we see nine captial G before the assert:

...
Now finish your task according to taskDefinition, only write the poem, add no commentary.
assistant
GGGGGGGGG/home/bjorn/vc/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed

segmond · 2025-05-12T15:05:04Z

I'm seeing this bug as well, I'm not passing in --dry-allowed-length 4.

main: server is listening on http://0.0.0.0:8089 - starting the main loop
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 16128, n_keep = 0, n_prompt_tokens = 88
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 88, n_tokens = 88, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 88, n_tokens = 88
/home/seg/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
dsv3b.sh: line 8: 68577 Aborted (core dumped) ~/llama.cpp/build/bin/llama-server -ngl 62 --host 0.0.0.0 --path ~/llama.cpp/examples/server/public -m /llmzoo/models/DeepSeek-V3-0324-UD-Q3_K_XL.gguf --port 8089 --override-tensor "blk.([0-4]).ffn_(up|down)exp.=CUDA0,blk.([1][0257]|[5]).ffn(up|down)exp.=CUDA1,blk.([2][0257]|[6]).ffn(up|down)exp.=CUDA2,blk.([3][0257]|[7]).ffn(up|down)exp.=CUDA3,blk.([4][0257]|[6][01]).ffn(up|down)exp.=CUDA4,blk.([5][02579]|[6][2]).ffn(up|down)exp.=CUDA5,blk.([8-9]|[1-9][0-9]).ffn.exp.=CPU" -md ~/models/draft/DeepSeek-V3-0324-DRAFT-0.5B-Q8_0.gguf -ngld 127 -devd CUDA2 -cd 16000 -fa -mg 4 --no-mmap -c 16000

michmill1970 · 2025-05-13T23:24:20Z

I can confirm the same behavior on MacOS.
Version: llama-b5353-bin-macos-arm64.zip
MacOS: 15.4.1 (24E263)

Error:

que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 1
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 2, front = 0
slot update_slots: id 0 | task 0 | kv cache rm [347, end)
srv process_chun: processing image...
image/slice encoded in 21169 ms
decoding image batch 1/1, n_tokens_batch = 256
set_causal_attn: value = 0
image decoded (batch 1/1) in 6587 ms
set_causal_attn: value = 1
srv process_chun: image processed in 27757 ms
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 609, n_tokens = 6, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 609, n_tokens = 6
srv update_slots: decoding batch, n_tokens = 6
set_embeddings: value = 0
clear_adapter_lora: call
/Users/runner/work/llama.cpp/llama.cpp/src/llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
zsh: abort /Users/myuserdir/Projects/llamacpp/bin/llama-server --model --mmproj 4096

command line to start llama-server:

/Users/myuserdir/Projects/llamacpp/bin/llama-server
--model /Users/myuserdir/Projects/ImageIndexer/resources/Qwen2-VL-2B-Instruct-Q6_K.gguf
--mmproj /Users/myuserdir/Projects/ImageIndexer/resources/mmproj-Qwen2-VL-2B-Instruct-f16.gguf
--ctx-size 4096
-v

JSON payload:

request: {
"max_tokens": 250,
"messages": [
{
"content": "You describe the image and generate keywords.",
"role": "system"
},
{
"content": [
{
"text": "The tasks are to describe the image and to come up with a large set of keyword tags for it.\n\nWrite the Description using the active voice.\n\nThe Keywords must be one or two words each. Generate as many Keywords as possible using a controlled and consistent vocabulary.\n\nFor both Description and Keywords, make sure to include:\n\n - Themes, concepts\n - Items, animals, objects\n - Structures, landmarks, setting\n - Foreground and background elements\n - Notable colors, textures, styles\n - Actions, activities\n\nIf humans are present, include: \n - Physical appearance\n - Gender\n - Clothing\n - Age range\n - Visibly apparent ancestry\n - Occupation/role\n - Relationships between individuals\n - Emotions, expressions, body language\n\nUse ENGLISH only. Generate ONLY a JSON object with the keys Description and Keywords as follows {"Description": str, "Keywords": []}\n\nThe example input would be a stock photo of two apples, one red and one green, against a white backdrop and is a hypothetical Description and Keyword for a non-existent image.\nOUTPUT=json{\"Description\": \"Two apples next to each other, one green and one red, placed side by side against a white background. There is even and diffuse studio lighting. The fruit is glossy and covered with dropplets of water indicating they are fresh and recently washed. The image emphasizes the cleanliness and appetizing nature of the food\", \"Keywords\": [\"studio shot\",\"green\",\"fruit\",\"red\",\"apple\",\"stock image\",\"health food\",\"appetizing\",\"empty background\",\"grocery\",\"food\",\"snack\"]}\n ",
"type": "text"
},
{
"image_url": {
"url": "data:image/jpeg;base64,...image content in base64 here..."
},
"type": "image_url"
}
],
"role": "user"
}
],
"min_p": 1.05,
"temperature": 0.1,
"top_k": 0,
"top_p": 1
}

bjodah added the bug-unconfirmed label May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405

Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405

bjodah commented May 9, 2025

bjodah commented May 9, 2025

segmond commented May 12, 2025

michmill1970 commented May 13, 2025 •

edited

Loading

Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405

Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed #13405

Comments

bjodah commented May 9, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

bjodah commented May 9, 2025

segmond commented May 12, 2025

michmill1970 commented May 13, 2025 • edited Loading

michmill1970 commented May 13, 2025 •

edited

Loading