Eval bug: mtmd in server mode crashes on too big image #13414

pwilkin · 2025-05-09T21:24:19Z

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5331 (33eff40)
built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

i7-9700K + GTX 3080 10GB VRAM

Models

Qwen2.5 VL (Q4_K_M)

Problem description & steps to reproduce

Upload a big image (i.e. 3MB for my case) without --no-mmproj-offload. The server crashes on an OOM.

First Bad Commit

33eff40

Relevant log output

srv  process_chun: processing image...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4514.05 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4733328896
/devel/tools/llama.cpp/ggml/src/ggml-backend.cpp:1662: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

ZUIcat · 2025-05-10T14:04:09Z

I also encountered a similar issue when using Qwen2.5-VL-7B-Instruct-Q6_K__bartowski.gguf this afternoon.

ngxson · 2025-05-10T16:24:39Z

Qwen VL is the only model not having hard cap on image size, this will need to be fixed very soon

ZUIcat · 2025-05-11T02:43:10Z

I tested the latest version build: 5343 (62d4250e) with MSVC 19.43.34810.0 for x64.
Still encountering errors when sending slightly larger images to Qwen2.5-VL-7B.
But Gemma-3-12B works without issues.

Here is the output:

srv  process_chun: processing image...
ggml_vulkan: Device memory allocation of size 4187092728 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4187092728
C:\_MyWorkSpace\WorkSpaceTemp\AI\Llama\llama.cpp\ggml\src\ggml-backend.cpp:1663: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed

ngxson · 2025-05-12T10:48:08Z

qwenvl uses unreasonable amount of tokens (kinda a quadratic scale) for large image, we will now have a max limit of 1024x1024 for qwen VL models #13478

the problem is not isolated to llama.cpp, other runtimes have the same problem: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/discussions/10

pwilkin · 2025-05-12T11:01:38Z

@ngxson Makes sense - will there be auto rescaling of bigger images to fit (or an option for it)?

pwilkin added the bug-unconfirmed label May 9, 2025

ngxson mentioned this issue May 10, 2025

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl #13434

Merged

ngxson closed this as completed in #13434 May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: mtmd in server mode crashes on too big image #13414

Eval bug: mtmd in server mode crashes on too big image #13414

pwilkin commented May 9, 2025 •

edited

Loading

ZUIcat commented May 10, 2025

ngxson commented May 10, 2025

ZUIcat commented May 11, 2025

ngxson commented May 12, 2025

pwilkin commented May 12, 2025

Eval bug: mtmd in server mode crashes on too big image #13414

Eval bug: mtmd in server mode crashes on too big image #13414

Comments

pwilkin commented May 9, 2025 • edited Loading

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ZUIcat commented May 10, 2025

ngxson commented May 10, 2025

ZUIcat commented May 11, 2025

ngxson commented May 12, 2025

pwilkin commented May 12, 2025

pwilkin commented May 9, 2025 •

edited

Loading