Skip to content

Eval bug: mtmd in server mode crashes on too big image #13414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pwilkin opened this issue May 9, 2025 · 5 comments · Fixed by #13434
Closed

Eval bug: mtmd in server mode crashes on too big image #13414

pwilkin opened this issue May 9, 2025 · 5 comments · Fixed by #13434

Comments

@pwilkin
Copy link
Contributor

pwilkin commented May 9, 2025

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5331 (33eff40)
built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

i7-9700K + GTX 3080 10GB VRAM

Models

Qwen2.5 VL (Q4_K_M)

Problem description & steps to reproduce

Upload a big image (i.e. 3MB for my case) without --no-mmproj-offload. The server crashes on an OOM.

First Bad Commit

33eff40

Relevant log output

srv  process_chun: processing image...
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4514.05 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4733328896
/devel/tools/llama.cpp/ggml/src/ggml-backend.cpp:1662: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
@ZUIcat
Copy link

ZUIcat commented May 10, 2025

I also encountered a similar issue when using Qwen2.5-VL-7B-Instruct-Q6_K__bartowski.gguf this afternoon.

@ngxson
Copy link
Collaborator

ngxson commented May 10, 2025

Qwen VL is the only model not having hard cap on image size, this will need to be fixed very soon

@ZUIcat
Copy link

ZUIcat commented May 11, 2025

I tested the latest version build: 5343 (62d4250e) with MSVC 19.43.34810.0 for x64.
Still encountering errors when sending slightly larger images to Qwen2.5-VL-7B.
But Gemma-3-12B works without issues.

Here is the output:

srv  process_chun: processing image...
ggml_vulkan: Device memory allocation of size 4187092728 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4187092728
C:\_MyWorkSpace\WorkSpaceTemp\AI\Llama\llama.cpp\ggml\src\ggml-backend.cpp:1663: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed

@ngxson
Copy link
Collaborator

ngxson commented May 12, 2025

qwenvl uses unreasonable amount of tokens (kinda a quadratic scale) for large image, we will now have a max limit of 1024x1024 for qwen VL models #13478

the problem is not isolated to llama.cpp, other runtimes have the same problem: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/discussions/10

@pwilkin
Copy link
Contributor Author

pwilkin commented May 12, 2025

@ngxson Makes sense - will there be auto rescaling of bigger images to fit (or an option for it)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants