Skip to content

Eval bug: ROCm error: batched GEMM not supported #14576

Open
@MrGibus

Description

@MrGibus

Name and Version

llama built from source - latest master.

llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 5836 (b9c3eefd)
built with cc (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3) for x86_64-redhat-linux

Build command:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"  cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF && cmake --build build --config Release -- -j 16

Operating systems

Linux

GGML backends

HIP

Hardware

AMD Ryzen 5 5600X
AMD ATI Radeon RX 7900 XTX

Models

All models tested failed
Note: Non-Quant models untested.

Problem description & steps to reproduce

llama-run --ngl 10 <MODEL>
fails with any GPU use, ngl 0 functions as expected

First Bad Commit

No response

Relevant log output

llama-run --ngl 10 codellama-7b.Q5_K_S.gguf
> hello
ROCm error: CUBLAS_STATUS_NOT_SUPPORTED
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas_impl at /home/Gibus/dev/repos/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1928
  hipblasGemmStridedBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, src0_ptr, cu_data_type_a, nb01/nb00, nb02/nb00, src1_ptr, cu_data_type_b, s11, s12, beta, dst_t, cu_data_type, ne0, ne1*ne0, ne12*ne13, cu_compute_type, HIPBLAS_GEMM_DEFAULT)
/home/Gibus/dev/repos/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:78: ROCm error
[New LWP 14104]
[New LWP 14103]
[New LWP 14102]
[New LWP 14096]
[New LWP 14077]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f60018ead93 in wait4 () from /lib64/libc.so.6
#0  0x00007f60018ead93 in wait4 () from /lib64/libc.so.6
#1  0x00007f600484702b in ggml_print_backtrace () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-base.so
#2  0x00007f600484717e in ggml_abort () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-base.so
#3  0x00007f6004469862 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-hip.so
#4  0x00007f6004474176 in ggml_cuda_mul_mat_batched_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-hip.so
#5  0x00007f60044710b8 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-hip.so
#6  0x00007f600446f237 in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-hip.so
#7  0x00007f600485df94 in ggml_backend_sched_graph_compute_async () from /home/Gibus/dev/repos/llama.cpp/build/bin/libggml-base.so
#8  0x00007f6004a49e61 in llama_context::graph_compute(ggml_cgraph*, bool) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libllama.so
#9  0x00007f6004a4a0ab in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libllama.so
#10 0x00007f6004a4f045 in llama_context::decode(llama_batch const&) () from /home/Gibus/dev/repos/llama.cpp/build/bin/libllama.so
#11 0x00007f6004a501cb in llama_decode () from /home/Gibus/dev/repos/llama.cpp/build/bin/libllama.so
#12 0x000000000041b926 in main ()
[Inferior 1 (process 14075) detached]
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions