-
Notifications
You must be signed in to change notification settings - Fork 11.8k
CUDA: fix crash with partial offloading of MoE #13439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: fix crash with partial offloading of MoE #13439
Conversation
What is the view? I am not able to reproduce the linked issue, and in the qwen3moe graph I don't see any views (of weights). |
I was able to reproduce it by building for a lower cc. There is no view in the graph, this is caused by the slice created in diff --git a/ggml/src/ggml-cuda/ggml-cuda.cu b/ggml/src/ggml-cuda/ggml-cuda.cu
index 7643c4b8b..f8a4e7216 100644
--- a/ggml/src/ggml-cuda/ggml-cuda.cu
+++ b/ggml/src/ggml-cuda/ggml-cuda.cu
@@ -2071,8 +2071,8 @@ static void ggml_cuda_mul_mat_id(ggml_backend_cuda_context & ctx, ggml_tensor *
ggml_tensor src0_slice = *src0;
src0_slice.ne[2] = 1;
src0_slice.nb[3] = src0_slice.nb[2];
- src0_slice.op = GGML_OP_VIEW;
- src0_slice.view_src = dst->src[0]; // non-const pointer to src0
+ src0_slice.op = GGML_OP_NONE;
+ src0_slice.view_src = nullptr;
src0_slice.data = (char *) src0->data + i02*nb02;
ggml_tensor src1_slice; |
It's correct that an assert is being triggered because clearing the supposed padding would actually be unsafe in that situation. It's just that I did not correctly consider that the effective number of tokens for the slice of |
2e93643
to
d5d74a2
Compare
I relaxed the condition a bit because for |
This doesn't fix the issue because the assert is still triggered. IMO the correct solution here would be to clear the padding of the |
d5d74a2
to
4bc8f75
Compare
I think the asserts were also in the wrong place. There were cases where they were being triggered even if |
This reverts commit 7474e00.
Fixes #13437 .
There is an edge case for clearing the padding that I did not consider. I don't think this edge case can be sensibly fixed other than by falling back to cuBLAS where no padding is needed.