Skip to content

CUDA: fix race conditions in FlashAttention kernels #13438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

JohannesGaessler
Copy link
Collaborator

There are 2 race conditions in the CUDA FlashAttention code as reported by compute-sanitizer --tool=racecheck ./build/bin/test-backend-ops. This PR adds the necessary calls to __syncthreads to fix them.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 10, 2025
@JohannesGaessler JohannesGaessler changed the title CUDA: fix race conditions FlashAttention kernels CUDA: fix race conditions in FlashAttention kernels May 10, 2025
@JohannesGaessler JohannesGaessler linked an issue May 10, 2025 that may be closed by this pull request
@JohannesGaessler JohannesGaessler merged commit 0208355 into ggml-org:master May 10, 2025
40 checks passed
ikawrakow pushed a commit to ikawrakow/ik_llama.cpp that referenced this pull request May 11, 2025
ikawrakow added a commit to ikawrakow/ik_llama.cpp that referenced this pull request May 11, 2025
@Panchovix
Copy link

Just found an issue after this commit, #13461

It does give GGML_ASSERT(cur_p->size > 0) failed.

I'm not exactly sure why it happens, but if you revert to the latest commit before this commit, or on latest commit, if reverting this specific commit, it works fine.

@JohannesGaessler
Copy link
Collaborator Author

Should be fixed by #13469 .

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: b5335 break flash attention on 4070
3 participants