-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Eval bug: mtmd in server mode crashes on too big image #13414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also encountered a similar issue when using Qwen2.5-VL-7B-Instruct-Q6_K__bartowski.gguf this afternoon. |
Qwen VL is the only model not having hard cap on image size, this will need to be fixed very soon |
I tested the latest version build: 5343 (62d4250e) with MSVC 19.43.34810.0 for x64. Here is the output: srv process_chun: processing image...
ggml_vulkan: Device memory allocation of size 4187092728 failed.
ggml_vulkan: Requested buffer size exceeds device memory allocation limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4187092728
C:\_MyWorkSpace\WorkSpaceTemp\AI\Llama\llama.cpp\ggml\src\ggml-backend.cpp:1663: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed |
qwenvl uses unreasonable amount of tokens (kinda a quadratic scale) for large image, we will now have a max limit of 1024x1024 for qwen VL models #13478 the problem is not isolated to llama.cpp, other runtimes have the same problem: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/discussions/10 |
@ngxson Makes sense - will there be auto rescaling of bigger images to fit (or an option for it)? |
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
version: 5331 (33eff40)
built with cc (Ubuntu 14.2.0-4ubuntu2) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
i7-9700K + GTX 3080 10GB VRAM
Models
Qwen2.5 VL (Q4_K_M)
Problem description & steps to reproduce
Upload a big image (i.e. 3MB for my case) without
--no-mmproj-offload
. The server crashes on an OOM.First Bad Commit
33eff40
Relevant log output
srv process_chun: processing image... ggml_backend_cuda_buffer_type_alloc_buffer: allocating 4514.05 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n: failed to allocate CUDA0 buffer of size 4733328896 /devel/tools/llama.cpp/ggml/src/ggml-backend.cpp:1662: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. Aborted (core dumped)
The text was updated successfully, but these errors were encountered: