CUDA: fix crash on large batch size for quant. MoE #13537

JohannesGaessler · 2025-05-14T12:21:59Z

Should fix issue described in #13435 (comment) .

This PR swaps the x and y dimensions of the CUDA grid for quantizing the activations since the x dimension has a higher maximum size.

slaren · 2025-05-14T12:41:06Z

ggml/src/ggml-cuda/quantize.cu

@@ -56,13 +56,13 @@ static __global__ void quantize_mmq_q8_1(
    constexpr int vals_per_scale = ds_layout == MMQ_Q8_1_DS_LAYOUT_D2S6 ? 64 : 32;
    constexpr int vals_per_sum   = ds_layout == MMQ_Q8_1_DS_LAYOUT_D2S6 ? 16 : 32;

-    const int64_t i0 = ((int64_t)blockDim.x*blockIdx.x + threadIdx.x)*4;
+    const int64_t i0 = ((int64_t)blockDim.x*blockIdx.y + threadIdx.x)*4;


blockDim.y?

blockDim refers to the maximum extents of threadIdx. The configuration of threads was not changed, therefore blockDim.x is still correct.

jukofyork · 2025-05-14T14:38:57Z

Please confirm whether #13537 fixes the issue.

Yeah, this fixed it for me - thanks!

CUDA: fix crash on large batch size for quant. MoE

634be72

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 14, 2025

JohannesGaessler mentioned this pull request May 14, 2025

CUDA: faster Deepseek FA, add Turing support #13435

Merged

slaren reviewed May 14, 2025

View reviewed changes

slaren approved these changes May 14, 2025

View reviewed changes

JohannesGaessler merged commit 4696d56 into ggml-org:master May 14, 2025
44 checks passed

Silver267 pushed a commit to Silver267/llama.cpp that referenced this pull request May 14, 2025

CUDA: fix crash on large batch size for quant. MoE (ggml-org#13537)

60526e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix crash on large batch size for quant. MoE #13537

CUDA: fix crash on large batch size for quant. MoE #13537

JohannesGaessler commented May 14, 2025

slaren May 14, 2025

JohannesGaessler May 14, 2025

jukofyork commented May 14, 2025

CUDA: fix crash on large batch size for quant. MoE #13537

CUDA: fix crash on large batch size for quant. MoE #13537

Conversation

JohannesGaessler commented May 14, 2025

slaren May 14, 2025

Choose a reason for hiding this comment

JohannesGaessler May 14, 2025

Choose a reason for hiding this comment

jukofyork commented May 14, 2025