CUDA: fix crash with partial offloading of MoE #13439

JohannesGaessler · 2025-05-10T19:49:19Z

There is an edge case for clearing the padding that I did not consider. I don't think this edge case can be sensibly fixed other than by falling back to cuBLAS where no padding is needed.

slaren · 2025-05-10T21:02:10Z

What is the view? I am not able to reproduce the linked issue, and in the qwen3moe graph I don't see any views (of weights).

slaren · 2025-05-10T21:42:19Z

I was able to reproduce it by building for a lower cc. There is no view in the graph, this is caused by the slice created in ggml_cuda_mul_mat_id. This also fixes it:

diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index 7643c4b8b..f8a4e7216 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -2071,8 +2071,8 @@ static void ggml_cuda_mul_mat_id(ggml_backend_cuda_context & ctx, ggml_tensor *
         ggml_tensor src0_slice = *src0;
         src0_slice.ne[2]    = 1;
         src0_slice.nb[3]    = src0_slice.nb[2];
-        src0_slice.op       = GGML_OP_VIEW;
-        src0_slice.view_src = dst->src[0]; // non-const pointer to src0
+        src0_slice.op       = GGML_OP_NONE;
+        src0_slice.view_src = nullptr;
         src0_slice.data     = (char *) src0->data + i02*nb02;

         ggml_tensor src1_slice;

JohannesGaessler · 2025-05-10T21:57:03Z

It's correct that an assert is being triggered because clearing the supposed padding would actually be unsafe in that situation. It's just that I did not correctly consider that the effective number of tokens for the slice of src0 could fall into a range where the code in ggml_cuda_mul_mat could decide to use MMVQ or MMQ after all.

JohannesGaessler · 2025-05-10T21:59:41Z

I relaxed the condition a bit because for src0->ne[0] % MATRIX_ROW_PADDING == 0 it would still be fine to use MMVQ or MMQ.

slaren · 2025-05-10T22:04:48Z

This doesn't fix the issue because the assert is still triggered. IMO the correct solution here would be to clear the padding of the view_src, but only if actually necessary.

JohannesGaessler · 2025-05-10T22:20:02Z

I think the asserts were also in the wrong place. There were cases where they were being triggered even if cudaMemsetAsync is not called.

This reverts commit 7474e00.

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 10, 2025

jussitus mentioned this pull request May 10, 2025

Eval bug: mmvq.cu:519: GGML_ASSERT(!src0->view_src) failed #13437

Closed

JohannesGaessler force-pushed the cuda-fix-partial-mmid branch from 2e93643 to d5d74a2 Compare May 10, 2025 21:58

CUDA: fix crash with partial offloading of MoE

4bc8f75

JohannesGaessler force-pushed the cuda-fix-partial-mmid branch from d5d74a2 to 4bc8f75 Compare May 10, 2025 22:18

slaren approved these changes May 11, 2025

View reviewed changes

JohannesGaessler merged commit 7474e00 into ggml-org:master May 11, 2025
44 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 14, 2025

Revert "CUDA: fix crash with partial offloading of MoE (ggml-org#13439)"

f9f644d

This reverts commit 7474e00.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix crash with partial offloading of MoE #13439

CUDA: fix crash with partial offloading of MoE #13439

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

slaren commented May 10, 2025 •

edited

Loading

Uh oh!

slaren commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

slaren commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA: fix crash with partial offloading of MoE #13439

CUDA: fix crash with partial offloading of MoE #13439

Uh oh!

Conversation

JohannesGaessler commented May 10, 2025

Uh oh!

slaren commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

slaren commented May 10, 2025

Uh oh!

JohannesGaessler commented May 10, 2025

Uh oh!

Uh oh!

Uh oh!

slaren commented May 10, 2025 •

edited

Loading