CUDA: fix race conditions in FlashAttention kernels #13438

JohannesGaessler · 2025-05-10T19:09:38Z

There are 2 race conditions in the CUDA FlashAttention code as reported by compute-sanitizer --tool=racecheck ./build/bin/test-backend-ops. This PR adds the necessary calls to __syncthreads to fix them.

Reference: ggml-org/llama.cpp#13438

Reference: ggml-org/llama.cpp#13438 Co-authored-by: Iwan Kawrakow <[email protected]>

Panchovix · 2025-05-12T01:26:28Z

Just found an issue after this commit, #13461

It does give GGML_ASSERT(cur_p->size > 0) failed.

I'm not exactly sure why it happens, but if you revert to the latest commit before this commit, or on latest commit, if reverting this specific commit, it works fine.

JohannesGaessler · 2025-05-12T07:29:25Z

Should be fixed by #13469 .

)" This reverts commit 0208355.

CUDA: fix race conditions FlashAttention kernels

e61d8f0

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 10, 2025

JohannesGaessler mentioned this pull request May 10, 2025

Eval bug: b5335 break flash attention on 4070 #13430

Closed

JohannesGaessler changed the title ~~CUDA: fix race conditions FlashAttention kernels~~ CUDA: fix race conditions in FlashAttention kernels May 10, 2025

JohannesGaessler linked an issue May 10, 2025 that may be closed by this pull request

Eval bug: b5335 break flash attention on 4070 #13430

Closed

CISC approved these changes May 10, 2025

View reviewed changes

JohannesGaessler merged commit 0208355 into ggml-org:master May 10, 2025
40 checks passed

ikawrakow pushed a commit to ikawrakow/ik_llama.cpp that referenced this pull request May 11, 2025

Fix race in the CUDA DeepSeek FA kernel

2f32589

Reference: ggml-org/llama.cpp#13438

ikawrakow mentioned this pull request May 11, 2025

Fix race in the CUDA DeepSeek FA kernel ikawrakow/ik_llama.cpp#406

Merged

ikawrakow added a commit to ikawrakow/ik_llama.cpp that referenced this pull request May 11, 2025

Fix race in the CUDA DeepSeek FA kernel (#406)

36e6e88

Reference: ggml-org/llama.cpp#13438 Co-authored-by: Iwan Kawrakow <[email protected]>

JohannesGaessler mentioned this pull request May 12, 2025

CUDA: fix misaligned synchronization in FA #13469

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 14, 2025

Revert "CUDA: fix race conditions FlashAttention kernels (ggml-org#13438

6a199dd

)" This reverts commit 0208355.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix race conditions in FlashAttention kernels #13438

CUDA: fix race conditions in FlashAttention kernels #13438

JohannesGaessler commented May 10, 2025

Panchovix commented May 12, 2025

JohannesGaessler commented May 12, 2025

CUDA: fix race conditions in FlashAttention kernels #13438

CUDA: fix race conditions in FlashAttention kernels #13438

Conversation

JohannesGaessler commented May 10, 2025

Panchovix commented May 12, 2025

JohannesGaessler commented May 12, 2025